# People Search Engine
This notebook shows the usage of a people search engine utilizing [Wikipedia](www.wikipedia.org), [Wikidata](www.wikidata.org), [Twitter](twitter.com), [Google](google.com), and [Linkedin](linkedin.com). Please look at README.md for a list of required packages and installation instructions. 

## Import Modules

In [1]:
from Person import Person
from WikiPeopleFinder import WikiPeopleFinder
from TwitterPeopleFinder import TwitterPeopleFinder
from GoogleNetWorthFinder import GoogleNetWorthFinder
from LinkedinPeopleFinder import LinkedinPeopleFinder

## Define a person instant of Person class

In [7]:
# instantiate a 'person' instance of the Person class given the first name and last name
person = Person(first_name="Jim", last_name="Carrey")

## Wikipedia People Finder
This people finder uses the data from both [Wikipedia](www.wikipedia.org) and [Wikidata](www.wikidata.org) in a systematic way. 

In [8]:
# instantiate the WikiPeopleFinder class
wikifinder = WikiPeopleFinder()

In [9]:
# Find the person 
wikifinder.find(person)

In [10]:
person.raw

{'first_name': 'Jim',
 'middle_name': '',
 'last_name': 'Carrey',
 'nationality': 'US',
 'domicile': '',
 'date_of_birth': '17-01-1962',
 'occupation': 'comedian, film producer, voice actor, film actor, screenwriter, television actor, anti-vaccine activist',
 'net_worth': '',
 'is_famous': 'True',
 'description': "James Eugene Carrey (born January 17, 1962) is a Canadian-American actor, comedian, impressionist, screenwriter, musician, producer and painter. He is known for his energetic slapstick performances.Carrey first gained recognition in America in 1990 after landing a recurring role in the sketch comedy television series In Living Color. His first leading roles in major productions came with Ace Ventura: Pet Detective (1994), Dumb and Dumber (1994), The Mask (1994), and Ace Ventura: When Nature Calls (1995), as well as a supporting role in Batman Forever (1995) and a lead role in Liar Liar (1997). He gained critical acclaim starring in serious roles in The Truman Show (1998) and 

In [8]:
wikifinder.find_as_df(person)

Unnamed: 0,first_name,middle_name,last_name,nationality,domicile,date_of_birth,occupation,net_worth,is_famous,description
0,Jim,,Carrey,US,,17-01-1962,"comedian, film producer, voice actor, film act...",,True,"James Eugene Carrey (born January 17, 1962) is..."


## Google Net Worth Finder
Extracts the net worth reported at the top of the Google search results. Obviously it works only for "famous enough" people. You can carry over a person from the results of the Twitter or Linkedin search and **add** its **missing** net worth with this GoogleNetoWorthFinder class.  

In [11]:
ggwealthfinder = GoogleNetWorthFinder()

In [12]:
ggwealthfinder.extract_net_worth(person)

In [13]:
person.raw

{'first_name': 'Jim',
 'middle_name': '',
 'last_name': 'Carrey',
 'nationality': 'US',
 'domicile': '',
 'date_of_birth': '17-01-1962',
 'occupation': 'comedian, film producer, voice actor, film actor, screenwriter, television actor, anti-vaccine activist',
 'net_worth': '150 million USD',
 'is_famous': 'True',
 'description': "James Eugene Carrey (born January 17, 1962) is a Canadian-American actor, comedian, impressionist, screenwriter, musician, producer and painter. He is known for his energetic slapstick performances.Carrey first gained recognition in America in 1990 after landing a recurring role in the sketch comedy television series In Living Color. His first leading roles in major productions came with Ace Ventura: Pet Detective (1994), Dumb and Dumber (1994), The Mask (1994), and Ace Ventura: When Nature Calls (1995), as well as a supporting role in Batman Forever (1995) and a lead role in Liar Liar (1997). He gained critical acclaim starring in serious roles in The Truman S

## Twitter People Finder
To use the Twitter People finder you need to provide your `CONSUMER_KEY, CONSUMER_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET`. If you don't have these credentials head to [here](https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html). If you don't have an already existing app, it may take a day or more to get your tokens. 

In [4]:
# It is assumed that your authorization keys are stored in 'twitter_tokens.csv' with space between the keys and their values
# Make sure the path to your tokens folder is correct ../tokens/?
fl = open('../../tokens/twitter_tokens.csv', 'r')
lines = fl.readlines()
tokens = {line.split()[0]:line.split()[1] for line in lines}

# You could also simply copy paste your keys directly here
CONSUMER_KEY = tokens['CONSUMER_KEY']
CONSUMER_SECRET = tokens['CONSUMER_SECRET']
OAUTH_TOKEN = tokens['OAUTH_TOKEN']
OAUTH_TOKEN_SECRET = tokens['OAUTH_TOKEN_SECRET']

In [5]:
twitterfinder = TwitterPeopleFinder(CONSUMER_KEY, CONSUMER_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

In [8]:
# twitter raw outputs (i.e., generic twitter user data)
twitterfinder.find_users_as_df(person)

Unnamed: 0,id_str,listed_name,screen_name,location,description,created_at,verified,followers_count,followings_count,listed_count,favourites_count,statuses_count,contributors_enabled,default_profile,protected,url
0,52551600,Jim Carrey,JimCarrey,Los Angeles,The ONLY official social media account for Act...,2009-06-30 22:58:44,True,18087153,1,58563,8,4395,False,False,False,
1,35448221,Jim Carrey Online,JimCarreyOnline,Everywhere,Officially unofficial fansite for Jim Carrey. ...,2009-04-26 11:29:15,False,22742,199,204,30,6305,False,False,False,http://t.co/BePLgzYv4G
2,60376578,El fan de Jim,EsJimCarrey,México,IMPORTANT: I hereby declare... I'm not real Ji...,2009-07-26 19:30:13,False,25398,19168,133,1513,296225,False,False,False,
3,3073056407,jim carrey,jimworld_0,"Los Angeles, CA",the private place for actor Jim Carrey,2015-03-05 18:31:57,False,2572,4982,10,66347,17032,False,True,False,
4,202784005,Reed Campbell 😎,iamreedcampbell,California/Michigan/New York,I WILL play Jim Carrey in his biopic. @WhyNotW...,2010-10-14 20:29:40,False,1759,883,26,5811,12221,False,False,False,
5,3185701728,Han♡Till The End,_JimCarreyFan42,,not really a jim carrey fan page ☁️💧6/8/18💛,2015-05-05 01:40:14,False,598,507,6,17376,12737,False,True,False,
6,3134013126,Casey Lee Williams,caseylwilliams,,"I'm Casey! A singer, most known for work in Ro...",2015-04-03 03:50:16,False,21922,576,43,9819,632,False,False,False,https://t.co/mX69uodOJ0
7,2886177392,Jim Carrey,JimFunney,,The Jim Carrey So funny it hurts Parody! NOT a...,2014-11-01 11:01:23,False,16981,6,12,0,1536,False,True,False,
8,2730384818,blakelyn,KianAndHazey,"Louisiana, USA","“Forget the pain, mock the pain, reduce it. An...",2014-08-13 21:57:48,False,2239,912,45,29759,29312,False,True,False,
9,26602679,JIM CARREY WOKE,MARSHMELLOWPIES,ECHO PARK,Photographer/Videographer askMarcial@gmail.com,2009-03-25 21:55:14,False,3540,171,61,301,5490,False,False,False,https://t.co/4n2UvIHWLU


In [None]:
# To get a list of instances of 'Person' class which match 'person'
persons = twitterfinder.find(person)

In [9]:
# To get the results as a Pandas data frame
twitterfinder.find_as_df(person)

Unnamed: 0,first_name,middle_name,last_name,nationality,domicile,date_of_birth,occupation,net_worth,is_famous,description
0,Jim,,Carrey,,US,,actor,,True,The ONLY official social media account for Act...
1,Jim,Carrey,Online,,,,,,False,Officially unofficial fansite for Jim Carrey. ...
2,jim,,carrey,,US,,actor,,False,the private place for actor Jim Carrey
3,Jim,,Carrey,,,,,,False,The Jim Carrey So funny it hurts Parody! NOT a...
4,JIM,CARREY,WOKE,,AU,,photographer,,False,Photographer/Videographer askMarcial@gmail.com
5,Jim,,Carrey,,US,,actor,,False,"Actor, ideologist, laughs, parody"
6,Jim,Carrey's,Beard,,,,,,False,"There is but 1, and we're it!!!"
7,Jim,Carrey's,beard,,CA,,,,False,"The question isn't why am I growing a beard, t..."
8,Not,Jim,Carrey,,,,,,False,ALRIGHTY THEN! Bringing you the funniest tweet...
9,*Jim,Carrey,voice*,,US,,"artist, writer",,False,artist. writer. content creative. food lover. ...


You can use the `strict_match` flag to only report the results that exactly match the first and last names.

In [10]:
twitterfinder.find_as_df(person, strict_match=1)

Unnamed: 0,first_name,middle_name,last_name,nationality,domicile,date_of_birth,occupation,net_worth,is_famous,description
0,Jim,,Carrey,,US,,actor,,True,The ONLY official social media account for Act...
1,jim,,carrey,,US,,actor,,False,the private place for actor Jim Carrey
2,Jim,,Carrey,,,,,,False,The Jim Carrey So funny it hurts Parody! NOT a...
3,Jim,,Carrey,,US,,actor,,False,"Actor, ideologist, laughs, parody"


You could also add more attributes to the person instance. 

In [15]:
person_ca = Person(first_name="Jim", last_name="Carrey", domicile='CA')

In [16]:
twitterfinder.find_as_df(person_ca)

Unnamed: 0,first_name,middle_name,last_name,nationality,domicile,date_of_birth,occupation,net_worth,is_famous,description
0,Jim,Carrey,Online,,,,,,False,Officially unofficial fansite for Jim Carrey. ...
1,Jim,,Carrey,,,,,,False,The Jim Carrey So funny it hurts Parody! NOT a...
2,Jim,Carrey's,Beard,,,,,,False,"There is but 1, and we're it!!!"
3,Jim,Carrey's,beard,,CA,,,,False,"The question isn't why am I growing a beard, t..."
4,Not,Jim,Carrey,,,,,,False,ALRIGHTY THEN! Bringing you the funniest tweet...
5,🇵🇷Jim,,Carrey,,,,,,False,The Puerto Rican Jim Carrey #Aquarius ♒️ #Trus...


## Linkedin People Finder
Linkedin people finder yields relatively more complete results compared to the Twitter results, however, Linkedin has quite restrict limitations for the number of queries one make from an ip address. There are some complex ways to get around this problem to some extent (like using proxies, updating cookies, using rotating ip addresses if possible). For this reason, I allowed an `offline` flag into find methods of `LinkedinPeopleFinder` class to feed in an already downloaded `html` file. 

In [17]:
linkedfinder = LinkedinPeopleFinder()

In [21]:
person = Person("John", "Smith")

In [22]:
path = '../data/sample_linkedin_html/66800_John_Smith_profiles _ LinkedIn.html'

In [23]:
linkedfinder.find_users_as_df(person, path)

Unnamed: 0,listed_name,headline,location,industry,current,past,education,summary,link
0,Christopher John Smith,"Sales Director - Rainmaker, Modernizing Sales",Greater Atlanta Area,Computer Software,"Sales Director at SalesLoft, Sr. Commercial Ac...","Fitness Coach at Orangetheory Fitness, Account...",Northern Kentucky University,#Salesmith. Investor. Entrepreneur Mindset. Se...,https://www.linkedin.com/in/christophersmithsa...
1,JOHN SMITH,Talent Acquisition manager,United Arab Emirates,Staffing and Recruiting,Talent acquisition manager at Arabtec Construc...,,,,https://www.linkedin.com/in/john-smith-630428b0
2,John Smith,Building Plans Examiner at City of Punta Gorda,"Punta Gorda, Florida Area",Government Relations,Building Plans Examiner at City of Punta Gorda,"President at Smittys Construction Inc, Sheet M...","Gloucester County College, Burlington County V...",,https://www.linkedin.com/in/john-smith-4a9310a4
3,John Smith,Player agent/Upsl Representative,United States,Professional Training &amp; Coaching,Manager at Self-employed,,"London Business School, University of Californ...",to contact me do it through whatsapp \n\nor sm...,https://www.linkedin.com/in/john-smith-a1061289
4,John Smith,President &amp; CEO,Greater Memphis Area,Transportation/Trucking/Railroad,"President &amp; CEO at FedEx Freight, Senior ...",,Northwestern State University,John A. Smith is president and chief executive...,https://www.linkedin.com/in/john-a-smith1
5,"John Smith, iMBA, iPhD","CEO Founder/Chair - Leadership Archaeologist, ...","Orlando, Florida Area",Professional Training &amp; Coaching,Founder / Chair at College of Excellence Onlin...,Founder Trustee at Florida Fellowship Foundati...,"Prager University, College of Executives Onlin...",What is a Collective? A leadership community t...,https://www.linkedin.com/in/johnsmithceo
6,John Smith,Freelance IT Consultant,United Kingdom,Information Technology and Services,Freelance IT Consultant at Freelance IT Consu...,Freelance IT Consultant at Freelance IT Consul...,Talent Acquisition,Experienced Technical Recruiter with a demonst...,https://www.linkedin.com/in/john-smith-7a5666120
7,John Smith,Content Manager at Bitforx,Georgia,Information Technology and Services,"Content Manager at Bitforx, Blockchain Develop...",,Eindhoven University of Technology,We have a Transparent Team of Educated Profess...,https://www.linkedin.com/in/john-smith-899882144
8,John Smith,Bounty Hunter,United States,Marketing and Advertising,,,,,https://www.linkedin.com/in/john-smith-274aa2131
9,John Smith,Manager Business Development | Qualified Goog...,"Indore Area, India",Information Technology and Services,Vice President Of Business Development at Conf...,Manager Business Development at Salasar Cyber ...,"APS University Rewa, jawahar navodaya vidyalaya",Having 5+ years of Experience as a Business an...,https://www.linkedin.com/in/john-smith-b3421b124


In [24]:
linkedfinder.find_as_df(person = person, path = path, offline=1)

Unnamed: 0,first_name,middle_name,last_name,nationality,domicile,date_of_birth,occupation,net_worth,is_famous,description
0,Christopher,John,Smith,,US,,"current: Sales Director at SalesLoft, Sr. Comm...",,,Northern Kentucky University; #Salesmith. Inve...
1,JOHN,,SMITH,,AE,,current: Talent acquisition manager at Arabtec...,,,; Talent Acquisition manager
2,John,,Smith,,US,,current: Building Plans Examiner at City of Pu...,,,"Gloucester County College, Burlington County V..."
3,John,,Smith,,US,,current: Manager at Self-employed; past:,,,"London Business School, University of Californ..."
4,John,,Smith,,US,,current: President &amp; CEO at FedEx Freight...,,,Northwestern State University; John A. Smith i...
5,iMBA,,John Smith,,US,,current: Founder / Chair at College of Excelle...,,,"Prager University, College of Executives Onlin..."
6,John,,Smith,,GB,,current: Freelance IT Consultant at Freelance...,,,Talent Acquisition; Experienced Technical Recr...
7,John,,Smith,,GE,,"current: Content Manager at Bitforx, Blockchai...",,,Eindhoven University of Technology; We have a ...
8,John,,Smith,,US,,Marketing and Advertising,,,; Bounty Hunter
9,John,,Smith,,IN,,current: Vice President Of Business Developmen...,,,"APS University Rewa, jawahar navodaya vidyalay..."


**Strict name match**:

In [25]:
linkedfinder.find_as_df(person = person, path = path, offline=1, strict_match=1)

Unnamed: 0,first_name,middle_name,last_name,nationality,domicile,date_of_birth,occupation,net_worth,is_famous,description
0,JOHN,,SMITH,,AE,,current: Talent acquisition manager at Arabtec...,,,; Talent Acquisition manager
1,John,,Smith,,US,,current: Building Plans Examiner at City of Pu...,,,"Gloucester County College, Burlington County V..."
2,John,,Smith,,US,,current: Manager at Self-employed; past:,,,"London Business School, University of Californ..."
3,John,,Smith,,US,,current: President &amp; CEO at FedEx Freight...,,,Northwestern State University; John A. Smith i...
4,John,,Smith,,GB,,current: Freelance IT Consultant at Freelance...,,,Talent Acquisition; Experienced Technical Recr...
5,John,,Smith,,GE,,"current: Content Manager at Bitforx, Blockchai...",,,Eindhoven University of Technology; We have a ...
6,John,,Smith,,US,,Marketing and Advertising,,,; Bounty Hunter
7,John,,Smith,,IN,,current: Vice President Of Business Developmen...,,,"APS University Rewa, jawahar navodaya vidyalay..."


**Match domicile**:

In [26]:
person = Person(first_name="John", last_name="Smith", domicile='US')

In [27]:
linkedfinder.find_as_df(person = person, path = path, offline=1, strict_match=1)

Unnamed: 0,first_name,middle_name,last_name,nationality,domicile,date_of_birth,occupation,net_worth,is_famous,description
0,John,,Smith,,US,,current: Building Plans Examiner at City of Pu...,,,"Gloucester County College, Burlington County V..."
1,John,,Smith,,US,,current: Manager at Self-employed; past:,,,"London Business School, University of Californ..."
2,John,,Smith,,US,,current: President &amp; CEO at FedEx Freight...,,,Northwestern State University; John A. Smith i...
3,John,,Smith,,US,,Marketing and Advertising,,,; Bounty Hunter
