In [1]:
import pandas as pd
import numpy as np
import spacy
from spacy import displacy

In [2]:
nlp = spacy.load("en_core_web_sm")

In [3]:
ner_pipeline_labels = nlp.get_pipe("ner").labels
ner_pipeline_labels

('CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART')

In [4]:
spacy.explain('ORG')

'Companies, agencies, institutions, etc.'

In [5]:
sample_text = """ The company was founded in December 2002 by Reid Hoffman and the founding team members from PayPal and Socialnet.com (Allen Blue, Eric Ly, Jean-Luc Vaillant, Lee Hower, Konstantin Guericke, Stephen Beitzel, David Eves, Ian McNish, Yan Pujante, Chris Saccheri).In late 2003, Sequoia Capital led the Series A investment in the company.In August 2004, LinkedIn reached 1 million users.In March 2006, LinkedIn achieved its first month of profitability.In April 2007, LinkedIn reached 10 million users.In February 2008, LinkedIn launched a mobile version of the site.

In June 2008, Sequoia Capital, Greylock Partners, and other venture capital firms purchased a 5% stake in the company for $53 million, giving the company a post-money valuation of approximately $1 billion. In November 2009, LinkedIn opened its office in Mumbai and soon thereafter in Sydney, as it started its Asia-Pacific team expansion. In 2010 LinkedIn opened an International Headquarters in Dublin, Ireland,received a $20 million investment from Tiger Global Management LLC at a valuation of approximately $2 billion,announced its first acquisition, Mspoke,and improved its 1% premium subscription ratio. In October of that year, Silicon Valley Insider ranked the company No. 10 on its Top 100 List of most valuable startups. By December, the company was valued at $1.575 billion in private markets. LinkedIn started its India operations in 2009 and a major part of the first year was dedicated to understanding professionals in India and educating members to leverage LinkedIn for career development.

LinkedIn office building at 222 Second Street in San Francisco (opened in March 2016)

LinkedIn office in Toronto inside the Toronto Eaton Centre

LinkedIn filed for an initial public offering in January 2011. The company traded its first shares on May 19, 2011, under the NYSE symbol "LNKD", at $45 per share. Shares of LinkedIn rose as much as 171% on their first day of trade on the New York Stock Exchange and closed at $94.25, more than 109% above IPO price. Shortly after the IPO, the site's underlying infrastructure was revised to allow accelerated revision-release cycles.In 2011 LinkedIn earned $154.6 million in advertising revenue alone, surpassing Twitter, which earned $139.5 million.LinkedIn's fourth-quarter 2011, earnings soared because of the company's increase in success in the social media world.[33] By this point LinkedIn had about 2,100 full-time employees compared to the 500 that it had in 2010.

In April 2014 LinkedIn announced that it had leased 222 Second Street, a 26-story building under construction in San Francisco's SoMa district, to accommodate up to 2,500 of its employees, with the lease covering 10 years.The goal was to join all San Francisco-based staff (1,250 as of January 2016) in one building, bringing sales and marketing employees together with the research and development team.They started to move in in March 2016. In February 2016 following an earnings report, LinkedIn's shares dropped 43.6% within a single day, down to $108.38 per share. LinkedIn lost $10 billion of its market capitalization that day.

In 2016 access to LinkedIn was blocked by Russian authorities for non-compliance with the 2015 national legislation that requires social media networks to store citizens' personal data on servers located in Russia.

In June 2016 Microsoft announced that it would acquire LinkedIn for $196 a share, a total value of $26.2 billion and the second largest acquisition made by Microsoft to date. The acquisition would be an all-cash, debt-financed transaction. Microsoft would allow LinkedIn to "retain its distinct brand, culture and independence", with Weiner to remain as CEO, who would then report to Microsoft CEO Satya Nadella. Analysts believed Microsoft saw the opportunity to integrate LinkedIn with its Office product suite to help better integrate the professional network system with its products. The deal was completed on December 8, 2016.

In late 2016 LinkedIn announced a planned increase of 200 new positions in its Dublin office, which would bring the total employee count to 1,200.Since 2017 94% of B2B marketers use LinkedIn to distribute content.

Soon after LinkedIn's acquisition by Microsoft, LinkedIn's new desktop version was introduced.The new version was meant to make the user experience seamless across mobile and desktop. Some of the changes were made according to the feedback received from the previously launched mobile app. Features that were not heavily used were removed. For example, the contact tagging and filtering features are not supported anymore.

Following the launch of the new user interface (UI), some users, complained about the missing features which were there in the older version, slowness, and bugs in it. The issues were faced by both free and premium users, and with both the desktop version and the mobile version of the site.

In 2019 LinkedIn launched globally the feature Open for Business that enables freelancers to be discovered on the platform.LinkedIn Events was launched in the same year.

In June 2020 Jeff Weiner stepped down as CEO and become executive chairman after 11 years in the role. Ryan Roslansky stepped up as CEO from his previous position as the senior vice president of product.In late July 2020, LinkedIn announced it laid off 960 employees, about 6 percent of total workforce, from the talent acquisition and global sales teams. In an email to all employees, CEO Ryan Roslansky said the cuts were due to effects of the global COVID-19 pandemic.In April 2021 CyberNews claimed that 500 million LinkedIn's accounts have leaked online.However, LinkedIn stated that "We have investigated an alleged set of LinkedIn data that has been posted for sale and have determined that it is actually an aggregation of data from a number of websites and companies".

In June 2021 PrivacySharks claimed that more than 700 million LinkedIn records was on sale on a hacker forum.LinkedIn later stated that this is not a breach, but scraped data which is also a violation of their Terms of Service.

Microsoft ended LinkedIn operations in China in October 2021"""

In [6]:
len(sample_text.split('.'))

59

In [7]:
ner_text = nlp(sample_text)

In [8]:
for word in ner_text.ents:
    print(word.text,word.label_,word.start_char, word.end_char)

December 2002 DATE 28 41
Reid Hoffman PERSON 45 57
PayPal ORG 93 99
Socialnet.com ORG 104 117
Allen Blue PERSON 119 129
Eric Ly PERSON 131 138
Jean-Luc Vaillant ORG 140 157
Lee Hower PERSON 159 168
Konstantin Guericke PERSON 170 189
Stephen Beitzel PERSON 191 206
David Eves PERSON 208 218
Ian McNish PERSON 220 230
Yan Pujante PERSON 232 243
Chris PERSON 245 250
late 2003 DATE 264 273
Sequoia Capital ORG 275 290
Series EVENT 299 305
August 2004 DATE 337 348
LinkedIn GPE 350 358
1 million CARDINAL 367 376
March 2006 DATE 386 396
LinkedIn GPE 398 406
April 2007 DATE 452 462
LinkedIn GPE 464 472
10 million CARDINAL 481 491
February 2008 DATE 501 514
LinkedIn GPE 516 524
June 2008 DATE 568 577
Sequoia Capital ORG 579 594
Greylock Partners ORG 596 613
5% PERCENT 659 661
$53 million MONEY 687 698
approximately $1 billion MONEY 745 769
November 2009 DATE 774 787
LinkedIn GPE 789 797
Mumbai GPE 819 825
Sydney GPE 849 855
Asia LOC 875 879
2010 DATE 907 911
LinkedIn GPE 912 920
Dublin GPE 961 967

In [10]:
len([ent for ent in ner_text.ents if ent.label_ == 'MONEY'])

14

In [12]:
displacy.render(ner_text,style="ent",jupyter=True)

In [13]:
!pip install spacy-annotator

Collecting spacy-annotator
  Downloading spacy_annotator-2.1.3-py3-none-any.whl (6.0 kB)
Installing collected packages: spacy-annotator


Successfully installed spacy-annotator-2.1.3


In [14]:
import spacy_annotator as spa
from pprint import pprint

In [15]:
sample_text_list = sample_text.split('\n')
sample_text_list = [item for item in sample_text_list if item]
print(len(sample_text_list))
print(sample_text_list)

15
[' The company was founded in December 2002 by Reid Hoffman and the founding team members from PayPal and Socialnet.com (Allen Blue, Eric Ly, Jean-Luc Vaillant, Lee Hower, Konstantin Guericke, Stephen Beitzel, David Eves, Ian McNish, Yan Pujante, Chris Saccheri).In late 2003, Sequoia Capital led the Series A investment in the company.In August 2004, LinkedIn reached 1 million users.In March 2006, LinkedIn achieved its first month of profitability.In April 2007, LinkedIn reached 10 million users.In February 2008, LinkedIn launched a mobile version of the site.', 'In June 2008, Sequoia Capital, Greylock Partners, and other venture capital firms purchased a 5% stake in the company for $53 million, giving the company a post-money valuation of approximately $1 billion. In November 2009, LinkedIn opened its office in Mumbai and soon thereafter in Sydney, as it started its Asia-Pacific team expansion. In 2010 LinkedIn opened an International Headquarters in Dublin, Ireland,received a $20 m

In [17]:
df = pd.DataFrame({'text': sample_text_list})

In [18]:
df

Unnamed: 0,text
0,The company was founded in December 2002 by R...
1,"In June 2008, Sequoia Capital, Greylock Partne..."
2,LinkedIn office building at 222 Second Street ...
3,LinkedIn office in Toronto inside the Toronto ...
4,LinkedIn filed for an initial public offering ...
5,In April 2014 LinkedIn announced that it had l...
6,In 2016 access to LinkedIn was blocked by Russ...
7,In June 2016 Microsoft announced that it would...
8,In late 2016 LinkedIn announced a planned incr...
9,Soon after LinkedIn's acquisition by Microsoft...


In [19]:
nlp = spacy.load("en_core_web_sm")

In [22]:
annotator = spa.Annotator(labels = ['company', 'person', 'money', 'place', 'date'], model = nlp)

In [23]:
df_labels = annotator.annotate(df = df, col_text = 'text')

HTML(value='-1 examples annotated, 16 examples left')

Text(value='', description='company', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

Text(value='', description='person', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

Text(value='', description='money', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

Text(value='', description='place', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

Text(value='', description='date', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

HBox(children=(Button(button_style='success', description='submit', style=ButtonStyle()), Button(button_style=â€¦

Output()