## Homework 1: Advanced Track -- Harvest the Twitter API

**Objective:** Write a series of functions that allow you to dynamically harvest Twitter data.

**Estimated Time to Complete:** 4-12 hours

#### Sections

 - **Section 1:** Setting up your developer account, using OAuth1 authentication (approx 45-120 minutes)
 - **Sections 2 & 3:** Navigating the API documentation, getting your first query strings (approx 45-120 minutes)
 - **Section 3:** Writing your API calls (approx 90 - 360 minutes)
 
#### What You'll Turn In:  
 - A `.py` (not a Notebook!) file that contains the functions that you were prompted to create.  These should contain comments demonstrating why your code does what it does, and after it's run, the instructor should be able to make the appropriate function calls in Spyder or any other IDE.

## Section 1:  Setting Up Your Developer Account

Most API's require you to do a little pre-work in order to be able to use them, so the first part of this homework assignment will be spent setting up your developer account so you have API Access.

**Step 1:  Create a Twitter Developer Account**

 - Make sure you have a regular twitter account before you do this
 - You can apply for a developer account here:  https://developer.twitter.com/en/apply-for-access
  - Choose either a student or hobbyist/personal account
  - **note:** these typically get approved right away, but it's possible you might have to wait a little bit......if 15 minutes pass, it might be best to take a break and come back in an hour or so.

**Step 2:  Create An App**

You don't have to intend to build an official software program to have an app.....this is just a way for you to get authentication keys to use with the API.

 - Go to the menu in the upper right hand corner and click on **Your Name** > **Apps**
 - Choose **Create An App**
 - You'll be prompted to enter some information about your app.  Don't worry too much about this, it can say almost anything.  You'll be prompted to list websites where it will be hosted...this can be anything for now.  Use https://generalassemb.ly if you're undecided about what to put.

**Step 3: Create Your API Tokens**

Now that you have an app, you can use its API tokens to go ahead and make requests like we did in class 3.  Like a lot of API's, the Twitter API uses something called OAuth authentication.  

If you didn't wait until the night before this assignment was due and have a spare 30 minutes, you can read a little about it here: https://oauth.net/

In any event, you need API tokens in order to make requests.  Do the following:

 - Go to the **Apps** section of your developer portal
 - Click on the **Details** button for the app that you just created
 - Click on the **Keys & Tokens** tab:
   - Two keys should already be given to you:  **API Key** & **API Secret Key**
   - Two you have to generate:  **Access token** & **Access token secret**
 - Generate your Access Token and Access Token Secret keys.  You'll need to write these down when you're done -- you can only see them once.

Now you're ready to make requests to the Twitter api.  Everytime you make a request, you'll need to include the 4 tokens you just created.  (You can always regenerate them for whatever reason).  

**Step 4:  Your First Request**

To make requests to the Twitter API you're going to need a module which is **not** already pre-installed in Anaconda. You'll need to install it via PIP, which is python's package manager.  It's called `requests_oauthlib`.  You can install this via Anaconda Prompt or Terminal by simply typing in the command `pip install requests_oauthlib`, and then you'll be finished.

The logistics of making an OAuth1 authenticated request are very similar to what was done in class 3, but with a few additional steps.  You can see how to do it here:  https://requests.readthedocs.io/en/master/user/authentication/#oauth-1-authentication.  The only thing you'll need to change is the info for your API tokens that are passed into the `OAuth1()` function.

Try making a request to the following URL to confirm that you have things set up correctly: 'https://api.twitter.com/1.1/account/verify_credentials.json'

In [1]:
# your code here
import requests
from requests_oauthlib import OAuth1
url ='https://api.twitter.com/1.1/account/verify_credentials.json'
auth = OAuth1('DeWbsfI9l3XhzARsLTo1kHYFL','m6JkY76aQv7zgJaApxz9IitfgfnlKCRo1cIkaqK3zqbQanDKk4', '1291070701408002050-ft6VvAOaqSvW77CEr5kWN6NzyHavPy','hXS9DBpTuFXnXwODPfgoPnjNSXcRKmWVqFG72uPXjthlu')

requests.get(url,auth=auth)


<Response [200]>

If you get your json object back, then you're good to go.

## Section 2: Searching Tweets

Most websites you access will have a long string attached to the end of them that look something like this:  `http://thewebsite.com/?year=2019&color=golden%20yellow&user_id=48549395959438`.

Most people have no reason to pay attention to any of this, but all the special symbols at the end are basically encoded commands that say 'return a website that displays x,y,z characteristics.'  

When accessing api data, it basically works the same way.

**Step 1:  Set Up Your First Query String**

If you go into Twitter and search for the term `Data Science`, you should be brought to a url that looks like this:  `https://twitter.com/search?q=Data%20Science&src=typed_query`

If you'd like, you can drop the `&src=typed_query` from the url and still get the same results.

There are some important details to pay attention to:

 - Like class 3 when we worked with GitHub, there is a **base url**.  In this case it's `https://twitter.com/search`
 - Whenever you enter a search for something, the base url will be followed by something that looks like `?q=My%20Search%20Term`
  - The `?` marks the beginning of the query string.  This basically says 'initiate a request with whatever parameters that follow'
  - The `q` is a **parameter**, essentially some condition to pass into the query string that determines what results will be given back to you.  In this case, `q` encodes the text you typed in into something the API can understand.
  
**Useful Thing To Do Right Now:** Go back to the Twitter search page, and just try searching for different things, and notice what shows up after the `q=`.  Here are some questions to ask yourself:

 - How are white spaces encoded?  Ie, if you search for `Jonathan Bechtel` in the search box, what shows up to account for the space between the two words?
 - What about hash symbols?  If you search for `#MeToo`, `#GirlsWhoCode` or `#DataScience`, what happens with that `#` symbol?
 - Once you get the hang of this, see if you can just re-create some searches yourself by creating the url directly, and bypassing the search box altogether.  Ie, be familiar enough with how searches are formatted that you know `https://twitter.com/search?q=%23DataScience` will take you to the same page as typing in `#DataScience` into the search box.

Now, let's try and make a request for a search for `Data Science`.  

If you look at Twitter's docs, you'll see that the base url for the search API is `'https://api.twitter.com/1.1/search/tweets.json`

This means you have to add the `?q=Whatever%20Word%20%Goes%20Here` to the end to complete the search.

So go ahead, and see if you can create your API call for a search for the term `Data Science`.

If you did it correctly, you should have a dictionary with a key called `statuses`, and it'll be a list with all of the tweets returned by your search.  

In [15]:
# your answer here
url ='https://api.twitter.com/1.1/search/tweets.json?q=data%20science'
datascience=requests.get(url,auth=auth)

In [16]:
datascience.json()

{'statuses': [{'created_at': 'Mon Aug 10 14:23:55 +0000 2020',
   'id': 1292829032401317889,
   'id_str': '1292829032401317889',
   'text': 'Indeed! My course sequence would be: algebra I-&gt; algebra II -&gt; statistics -&gt; data science https://t.co/GZ5kXuhRT9',
   'truncated': False,
   'entities': {'hashtags': [],
    'symbols': [],
    'user_mentions': [],
    'urls': [{'url': 'https://t.co/GZ5kXuhRT9',
      'expanded_url': 'https://twitter.com/austen/status/1292608488367984640',
      'display_url': 'twitter.com/austen/status/…',
      'indices': [99, 122]}]},
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str': None,
   'in_reply_to_user_id': None,
   'in_reply_to_user_id_str': None,
   'in_reply_to_screen_name': None,
   'user': {'id': 841280011164368897,
    'id_str': '841280011164368897',
    'na

For good measure, try doing a search for tweets relating to `#MeToo` as well.

In [128]:
# your answer here
url ='https://api.twitter.com/1.1/search/tweets.json?q=%23metoo&src=typed_query'
metoo=requests.get(url,auth=auth)

In [129]:
metoo.json()

{'statuses': [{'created_at': 'Sun Aug 09 08:53:00 +0000 2020',
   'id': 1292383364683575298,
   'id_str': '1292383364683575298',
   'text': 'RT @Fnordspotting: #RefugeesWelcome, #MeToo, #FridaysForFuture och nu senast #BlackLivesMatter. Ser ni ett mönster här? Detta är de postmod…',
   'truncated': False,
   'entities': {'hashtags': [{'text': 'RefugeesWelcome', 'indices': [19, 35]},
     {'text': 'MeToo', 'indices': [37, 43]},
     {'text': 'FridaysForFuture', 'indices': [45, 62]},
     {'text': 'BlackLivesMatter', 'indices': [77, 94]}],
    'symbols': [],
    'user_mentions': [{'screen_name': 'Fnordspotting',
      'name': 'Fnordspotting',
      'id': 1447363795,
      'id_str': '1447363795',
      'indices': [3, 17]}],
    'urls': []},
   'metadata': {'iso_language_code': 'sv', 'result_type': 'recent'},
   'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str': None,
   'in_reply

**Step 2:  Adding Parameters to Your Query String**

Query strings basically have two parts:

 - The `?` initiates the beginning of the API call, and basically says 'everything that follows this will encode something about the information that's going to get returned to you'.
 - What follows that is are a bunch of symbols followed by `=` signs.  These are parameters.
 - So when you make an api call to `'https://api.twitter.com/1.1/search/tweets.json?q=My%20Search%20Term`, the `q` is a paremeter.  
 - You can add multiple paremeters to a query string. They are separated by `&`. They dictate what kinds of results are returned.  
  - For example, a parameter you can use in Twitter's search API is `count`, which tells you how many results to return.  The default is 15, but you can return up to 100.  So if we wanted to search for tweets and return 50 results our query string would look like the following:
    `https://api.twitter.com/1.1/search/tweets.json?q=My%20Search%20String&count=50`
  - You can add as many of these parameters to your string as you'd like.  So for example, if we wanted to include parameters for `count` and `result_type`, we could do the following: `https://api.twitter.com/1.1/search/tweets.json?q=My%20Search%20String&count=50&result_type=mixed`
  
To get the hang of this, try searching for tweets that mention the hashtag `#DeepLearning`, and return 75 results.

In [130]:
url ='https://api.twitter.com/1.1/search/tweets.json?q=%23deeplearning&count=75'
deeplearn=requests.get(url,auth=auth).json()

In [131]:
deeplearn

{'statuses': [{'created_at': 'Sun Aug 09 08:53:31 +0000 2020',
   'id': 1292383496636424193,
   'id_str': '1292383496636424193',
   'text': 'RT @SamuelAkins12: This is a day at the beach in 1896!!  All hail 👉🏾 Deep learning algorithms for enhanced image interpolation, HD, color e…',
   'truncated': False,
   'entities': {'hashtags': [],
    'symbols': [],
    'user_mentions': [{'screen_name': 'SamuelAkins12',
      'name': 'Samuel Akinosho®',
      'id': 1053247241618685953,
      'id_str': '1053247241618685953',
      'indices': [3, 17]}],
    'urls': []},
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str': None,
   'in_reply_to_user_id': None,
   'in_reply_to_user_id_str': None,
   'in_reply_to_screen_name': None,
   'user': {'id': 22000414,
    'id_str': '22000414',
    'name': 'Kiwi Paul',
    'screen

Try adding a second parameter.  You can find the list here:  https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets

In [132]:
url ='https://api.twitter.com/1.1/search/tweets.json?q=%23deeplearning&count=75&result_type=recent'
deeplearn=requests.get(url,auth=auth).json()

In [133]:
deeplearn

{'statuses': [{'created_at': 'Sun Aug 09 08:53:31 +0000 2020',
   'id': 1292383496636424193,
   'id_str': '1292383496636424193',
   'text': 'RT @SamuelAkins12: This is a day at the beach in 1896!!  All hail 👉🏾 Deep learning algorithms for enhanced image interpolation, HD, color e…',
   'truncated': False,
   'entities': {'hashtags': [],
    'symbols': [],
    'user_mentions': [{'screen_name': 'SamuelAkins12',
      'name': 'Samuel Akinosho®',
      'id': 1053247241618685953,
      'id_str': '1053247241618685953',
      'indices': [3, 17]}],
    'urls': []},
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str': None,
   'in_reply_to_user_id': None,
   'in_reply_to_user_id_str': None,
   'in_reply_to_screen_name': None,
   'user': {'id': 22000414,
    'id_str': '22000414',
    'name': 'Kiwi Paul',
    'screen

## Section 3: Searching Users

The last section of the API you'll need to get the hang of before you're let loose is the users API, which allows you to search for users and get their followers, friends, etc, as opposed to tweets which fit a particular criteria.  This part is pretty similar to the advanced lab in class 3, so if you saw how that worked then you shouldn't need much instruction.  

But if you're seeing this with fresh eyes, you'll want to spend 15-20 minutes to make sure you understand this part.  

Official documentation can be found here:  https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/overview

So, as an example, if you want to get a list of someone's followers, you use the base url `https://api.twitter.com/1.1/followers/list.json` and then enter your query string to get a list of that person's followers.  

List of parameters to use can be found here:  https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-followers-list

One possible parameter to use is `screen_name`, so if you wanted to get a list of someone's followers based on their screen name (the handle that begins with an @), then you would set up your API call to look something like:

`https://api.twitter.com/1.1/followers/list.json?screen_name=persons_screenname`

Note that you exclude the `@`.

**Your turn:** Pull in the list of General Assembly's followers.  General Assembly's handle is `@GA`.

Note that this won't return the whole list of GA's users.  If you want to do that you have to use cursoring:  https://developer.twitter.com/en/docs/basics/cursoring.  This is the topic of your bonus assignment.

In [134]:
# your answer here
url ='https://api.twitter.com/1.1/followers/list.json?screen_name=GA'
GA=requests.get(url,auth=auth).json()

In [135]:
url ='https://api.twitter.com/1.1/users/show.json?screen_name=GA'

GAuser=requests.get(url,auth=auth).json()

In [136]:
GAuser

{'id': 170393291,
 'id_str': '170393291',
 'name': 'General Assembly',
 'screen_name': 'GA',
 'location': '',
 'profile_location': None,
 'description': 'We transform careers and teams — including more than one third of the Fortune 100 — through dynamic courses in coding, data, design, and business.',
 'url': 'https://t.co/YQeEXPxJ4H',
 'entities': {'url': {'urls': [{'url': 'https://t.co/YQeEXPxJ4H',
     'expanded_url': 'http://ga.co/Twitter',
     'display_url': 'ga.co/Twitter',
     'indices': [0, 23]}]},
  'description': {'urls': []}},
 'protected': False,
 'followers_count': 165519,
 'friends_count': 5416,
 'listed_count': 3263,
 'created_at': 'Sat Jul 24 18:19:59 +0000 2010',
 'favourites_count': 36405,
 'utc_offset': None,
 'time_zone': None,
 'geo_enabled': True,
 'verified': True,
 'statuses_count': 22756,
 'lang': None,
 'status': {'created_at': 'Sat Aug 08 17:27:02 +0000 2020',
  'id': 1292150339995607045,
  'id_str': '1292150339995607045',
  'text': '5 benefits of coding th

In [137]:
GA

{'users': [{'id': 14868552,
   'id_str': '14868552',
   'name': 'Игорб',
   'screen_name': 'huipizdayebatsy',
   'location': 'Moscow',
   'description': 'https://t.co/5PajhCpOCg',
   'url': None,
   'entities': {'description': {'urls': [{'url': 'https://t.co/5PajhCpOCg',
       'expanded_url': 'https://ttttt.me/poshel_na_hui',
       'display_url': 'ttttt.me/poshel_na_hui',
       'indices': [0, 23]}]}},
   'protected': False,
   'followers_count': 1686,
   'friends_count': 755,
   'listed_count': 28,
   'created_at': 'Thu May 22 14:06:48 +0000 2008',
   'favourites_count': 6546,
   'utc_offset': None,
   'time_zone': None,
   'geo_enabled': True,
   'verified': False,
   'statuses_count': 34306,
   'lang': None,
   'status': {'created_at': 'Sat Aug 08 21:57:35 +0000 2020',
    'id': 1292218423812988931,
    'id_str': '1292218423812988931',
    'text': 'RT @UKRaveComments: Why do tigers get lost all the time?\n...\n...\n...\n...\n...\nBecause the jungle is massive!',
    'truncated': F

In [138]:
url ='https://api.twitter.com/1.1/followers/list.json?screen_name=GA&cursor=1674310506286027278'
GA2=requests.get(url,auth=auth).json()

In [139]:
url ='https://api.twitter.com/1.1/followers/list.json?screen_name=GA&cursor=1674254494235162030'
GA3=requests.get(url,auth=auth).json()

In [140]:
def find_user(user):
    import requests
    user=user.strip('@')
    d={}
    url =f'https://api.twitter.com/1.1/users/show.json?screen_name={user}'
    userdata=requests.get(url,auth=auth).json()
    d ={
        'Name': [userdata['name']],
        'screen_name' : [userdata['screen_name']],
        'followers_count'  : [userdata['followers_count']],
        'friends_count' : [userdata['friends_count']]
        }
    return d
    


In [141]:
url =f'https://api.twitter.com/1.1/users/show.json?screen_name=GA'
userdata=requests.get(url,auth=auth).json()

In [142]:
url =f'https://api.twitter.com/1.1/users/show.json?screen_name=GA'
requests.get(url, auth=auth).json()

{'id': 170393291,
 'id_str': '170393291',
 'name': 'General Assembly',
 'screen_name': 'GA',
 'location': '',
 'profile_location': None,
 'description': 'We transform careers and teams — including more than one third of the Fortune 100 — through dynamic courses in coding, data, design, and business.',
 'url': 'https://t.co/YQeEXPxJ4H',
 'entities': {'url': {'urls': [{'url': 'https://t.co/YQeEXPxJ4H',
     'expanded_url': 'http://ga.co/Twitter',
     'display_url': 'ga.co/Twitter',
     'indices': [0, 23]}]},
  'description': {'urls': []}},
 'protected': False,
 'followers_count': 165519,
 'friends_count': 5416,
 'listed_count': 3263,
 'created_at': 'Sat Jul 24 18:19:59 +0000 2010',
 'favourites_count': 36405,
 'utc_offset': None,
 'time_zone': None,
 'geo_enabled': True,
 'verified': True,
 'statuses_count': 22756,
 'lang': None,
 'status': {'created_at': 'Sat Aug 08 17:27:02 +0000 2020',
  'id': 1292150339995607045,
  'id_str': '1292150339995607045',
  'text': '5 benefits of coding th

In [143]:
def find_user(user, keys=None):
    user=user.strip('@')
    url =f'https://api.twitter.com/1.1/users/show.json?screen_name={user}'
    userdata=requests.get(url,auth=auth).json()
    
    if keys != None:
       #userdata = {}
        #print(userdata)
        for key in keys:
            print(key)
            userdata_keys = userdata[key]
            return userdata_keys
    else:
        return userdata



In [1]:
def find_user(user, keys=None):
    user=user.strip('@')
    url =f'https://api.twitter.com/1.1/users/show.json?screen_name={user}'
    userdata=requests.get(url,auth=auth).json()
    
    if keys != None:
        userdata_keys = {}
        for key in keys:
            userdata_keys[key] = userdata[key]
        return userdata_keys
    else:
        return userdata

In [2]:
find_user('GA',  keys=['name', 'url', 'created_at'])

NameError: name 'requests' is not defined

In [146]:
url ='https://api.twitter.com/1.1/search/tweets.json?q=%23deeplearning&count=75&result_type=recent,popular'
requests.get(url,auth=auth).json()

{'statuses': [{'created_at': 'Sun Aug 09 08:54:07 +0000 2020',
   'id': 1292383645915701249,
   'id_str': '1292383645915701249',
   'text': '#Unsplash’s #dataset is now #opensource #images #DeepLearning https://t.co/QmZQScEuaM',
   'truncated': False,
   'entities': {'hashtags': [{'text': 'Unsplash', 'indices': [0, 9]},
     {'text': 'dataset', 'indices': [12, 20]},
     {'text': 'opensource', 'indices': [28, 39]},
     {'text': 'images', 'indices': [40, 47]},
     {'text': 'DeepLearning', 'indices': [48, 61]}],
    'symbols': [],
    'user_mentions': [],
    'urls': [{'url': 'https://t.co/QmZQScEuaM',
      'expanded_url': 'https://unsplash.com/blog/the-unsplash-dataset/',
      'display_url': 'unsplash.com/blog/the-unspl…',
      'indices': [62, 85]}]},
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str'

In [147]:
def find_hashtag(hashtag, count=None, search_type=None):
   
    if not hashtag.startswith('#'):
        hashtag='%23'+hashtag
    else:
        hashtag='%23' + hashtag.strip('#')

    if search_type != None:
        searchtype=search_type.replace('/', ',')
        url =f'https://api.twitter.com/1.1/search/tweets.json?q={hashtag}&count={count}&result_type={searchtype}'
        hashtag_search=requests.get(url,auth=auth).json()
    else:
        url =f'https://api.twitter.com/1.1/search/tweets.json?q={hashtag}&count={count}'
    return requests.get(url,auth=auth).json()


In [148]:
find_hashtag('#DataScience', count=50)

{'statuses': [{'created_at': 'Sun Aug 09 08:54:10 +0000 2020',
   'id': 1292383660604358657,
   'id_str': '1292383660604358657',
   'text': "RT @realColinMac: 💠 Did you hear about the #programmer arrested before they could check their #code?...Arrested for a crime; they didn't co…",
   'truncated': False,
   'entities': {'hashtags': [{'text': 'programmer', 'indices': [43, 54]},
     {'text': 'code', 'indices': [94, 99]}],
    'symbols': [],
    'user_mentions': [{'screen_name': 'realColinMac',
      'name': 'Colin McGuire',
      'id': 892214142672683008,
      'id_str': '892214142672683008',
      'indices': [3, 16]}],
    'urls': []},
   'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
   'source': '<a href="https://labnol.org/" rel="nofollow">Web design Retweet</a>',
   'in_reply_to_status_id': None,
   'in_reply_to_status_id_str': None,
   'in_reply_to_user_id': None,
   'in_reply_to_user_id_str': None,
   'in_reply_to_screen_name': None,
   'user': {'id': 12669089

In [149]:
def get_followers(screen_name, keys=['name', 'followers_count', 'friends_count', 'screen_name'], to_df=False):
    screen_name=screen_name.strip('@')
    url =f'https://api.twitter.com/1.1/followers/list.json?screen_name={screen_name}'
    results=requests.get(url,auth=auth).json()['users']
    
#    dict_users={}
#    dict_users ={
#        'Name': [user['name'] for user in results],
#        'followers_count': [user['followers_count'] for user in results],
#        'friends_count': [user['friends_count'] for user in results],
#        'screen_name': [user['screen_name'] for user in results],      
 #   }
    
    dict_users={}
    for i in keys:
        dict_users[i] = [user[i] for user in results]
    

    if to_df:
        import pandas as pd
        usersdf = pd.DataFrame(dict_users)
#        d_df=pd.DataFrame(d)
        return usersdf
    
    return results
    
    

In [156]:
get_followers('GA',  to_df=True)

Unnamed: 0,name,followers_count,friends_count,screen_name
0,Игорб,1686,755,huipizdayebatsy
1,karly smith,1,25,karlymsdev
2,German Diaz,112,727,gdiaz324
3,Elizabeth,2,34,puffdivastarr
4,HURPSY IMAGERY 📸,249,2511,hurpsy4eva
5,Denise Bagans 🧛‍♀️🧡🧛‍♂️,64,165,BagansDenise
6,WeDiversify,7,102,WeDiversifyMe
7,Priya,429,1062,PriyaKour007
8,Principe Cabrera,5,127,PrincipeCabrera
9,Cameron Baumgartner,229,543,cameronhbg


## Section 4: Functions

This section details the functions you have to write and turn in as part of your homework assignment.  

Please read the requirements carefully.

**What you'll turn in:** A `.py` file with all of the functions written.  We should be able to load this into an IDE, run the file, and then call your functions to verify how and if they work. This file should also be properly commented so we can follow your line of reasoning.

The functions you'll be prompted to write will be defined in the following ways:

 - **name:** the name of the function
 - **returns:** what the function should return
 - **arguments:** arguments to include inside the function in order to specify how it should behave.
 
 **Note:** The free API has limitations built into it, so this means from time-to-time you'll only be able to return some of the results from the API.  This is fine.  It's understood and recognized that your functions won't be able to return an entire list of someone's users or other such things, so as long as your work delivers the best it can under present circumstances you'll be in good shape.
 
 **Other Note:** Every aspect of the API that you need to use can be found on either of these pages.
 
 Search API:  https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets
 
 Users API: https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference
 
**Remarks About Your Final Work**

 - It's okay if you get stuck somewhere.  If there's one item that you can't figure out and it doesn't quite work right, it's probably best to move on and try other things.  Again, try and explain what you were looking to do.  You'll pass if you give an honest effort.
 - There's potentially a lot of error handling you could do to verify user input is correct, but you can leave that alone for now.  Just make sure the core purpose of the function works the way it's supposed to.
 - While you're working on this, it's possible you may bump into your API limits.  Keep this in mind if you have a function that's working, but 45 minutes later it doesn't and you haven't changed anything.  This usually means the data you're getting back from your API calls isn't what it's supposed to be because you've exhausted your limits. We won't hold you to double checking for all of this in your functions.
 - In the file you turn in, make sure your requests are referencing your API tokens, so that way we can run your file right away.  Ie, make sure somewhere in your script you have a variable at the top that reads `tokens = OAuth1('token1', 'token2', 'token3', 'token4')` so it can be used for your requests inside the file.

##### Function 1 (Required)

**Name:** `find_user`

**Returns:** dictionary that represents user object returned by Twitter's API

**Arguments:**
 - `screen_name`: str, required; Twitter handle to search for.  **Can include the @ symbol.  The function should check for this and remove it if necessary.**
 - `keys`: list, optional; list that contains keys to return about user object.  If not specified, then function should return the entire user object.  **These only need to be outer keys.** If they are keys nested within another key, you don't have to account for this.
 
**To test:** We'll test your function in the following ways:

 - `find_user('@GA')`
 - `find_user('GA')`
 - `find_user('GA', keys=['name', 'screen_name', 'followers_count', 'friends_count'])`

##### Function 2 (Required)

**Name:** `find_hashtag`

**Returns:** list of data objects that contain information about each tweet that matches the hashtag provided as input.

**Arguments:**
 - `hashtag`: str, required; text to use as a hashtag search.  
 - `count`: int, optional; number of results to return
 - `search_type`: str, optional; type of results to return.  should accept 3 different values:
   - `mixed`:   return mix of most recent and most popular results
   - `recent`:  return most recent results
   - `popular`: return most popular results
   
**Note:** User should **not** have to actually use the `#` character for the `hashtag` argument.  The function should check to see if it's there, and if not, add it in for them.

**To Test:**  We'll check your function in the following ways:
 - `find_hashtag('DataScience')`
 - `find_hashtag('#DataScience')`
 - `find_hashtag('#DataScience', count=100)`, and double check the length of the `statuses` key to make sure it contains the right amount of results.  **Note:** Due to the version of the API we're using, the number of results returned will **not** necessarily match the value passed into the `count` parameter.  So if you specify 50 and it only returns 45, you are likely still doing it correctly.
 - `find_hashtag('#DataScience', search_type='recent/mixed/popular')`

##### Function 3 (Required)

**Name:** `get_followers`

**Returns:** list of data objects for each of the users followers, returning values for the `name`, `followers_count`, `friends_count`, and `screen_name` key for each user.

**Arguments:** 

 - `screen_name`: str, required; Twitter handle to search for.  **Results should not depend on user inputting the @ symbol.**
 - `keys`: list, required;  keys to return for each user.  default value: [`name`, `followers_count`, `friends_count`, `screen_name`]; if something else is listed, values for those keys should be returned
 - `to_df`: bool, required; default value: False; if True, return results in a dataframe.  Every value provided in the `keys` argument should be its own column, with rows populated by the corresponding values for each one for every user.
 
**To Test:** We'll test your functions in the following ways:

 - `get_followers('@GA')`
 - `get_followers('GA')`
 - `get_followers('GA', keys=['name', 'followers_count'])`
 - `get_followers('GA', keys=['name', 'followers_count'], to_df=True)`
 - `get_followers('GA', to_df=True)`

##### Function 4 (Optional)

**Name:** `friends_of_friends`

**Returns:** list of data objects for each user that two Twitter users have in common

**Arguments:**

 - `names`: list, required; list of two Twitter users to compare friends list with
 - `keys`: list, optional; list of keys to return for information about each user.  Default value should be to return the entire data object.
 - `to_df`: bool, required; default value: False; if True, returns results in a dataframe.
 
**To Test:** We'll test your function in the following ways:

 - `friends_of_friends(['Beyonce', 'MariahCarey'])`
 - `friends_of_friends(['@Beyonce', '@MariahCarey'], to_df=True)`
 - `friends_of_friends(['Beyonce', 'MariahCarey'], keys=['id', 'name'])`
 - `friends_of_friends(['Beyonce', 'MariahCarey'], keys=['id', 'name'], to_df=True)`
 
Each of these should return 3 results. (Assuming they haven't followed the same people since this was last written).  

**Hint:** The `id` key is the unique identifier for someone, so if you want to check if two people are the same this is the best way to do it.

In [10]:
def friends_of_friends(name1, name2, keys=None, to_df=False):
    url =f'https://api.twitter.com/1.1/friends/list.json?screen_name={name1}&count=200'
    results1=requests.get(url,auth=auth).json()['users']
    url =f'https://api.twitter.com/1.1/friends/list.json?screen_name={name2}&count=200'
    results2=requests.get(url,auth=auth).json()['users']
    
    results2_id=[item['id'] for item in results2]
    print(len(results2_id))
    listfriends=[]
#    for item in results1:
#        if item in results2:
#            listfriends.append(item)
#    print(len(listfriends))
#    return listfriends
#new_dict = {item['name']:item for item in data}
  #  listfriends=[item for item in results1 if item in results2]
    listfriends=[item for item in results1 if item['id'] in results2_id]
    
    print(len(listfriends))
    
    results_dict={}
    if keys !=None:
        for i in keys:
            results_dict[i] = [user[i] for user in listfriends]
    else:
        for k,v in [(key,d[key]) for d in listfriends for key in d]:
            if k not in results_dict: 
                results_dict[k]=[v]
            else: 
                results_dict[k].append(v)
        
    #    return results_keys
    #    print(type(results_keys))
   
   
    if to_df:
        import pandas as pd
        friendsdf = pd.DataFrame(results_dict)
        return friendsdf
    else:
        return results1

In [11]:
friends_of_friends('Beyonce', 'MariahCarey', to_df=True)

KeyError: 'users'

In [158]:
friends_of_friends('MariahCarey', keys=['name'] )

TypeError: friends_of_friends() missing 1 required positional argument: 'name2'

 ##### Function 5 (Optional)

Rewrite the `friends_of_friends` function, except this time include an argument called `full_search`, which accepts a boolean value.  If set to `True`, use cursoring to cycle through the complete set of users for the users provided.  

The twitter API only returns a subset of users in your results to save bandwidth, so you have to cycle through multiple result sets to get all of the values.

You can read more about how this works here:  https://developer.twitter.com/en/docs/basics/cursoring

Basically you have to do a `while` loop to continually make a new request using the values stored in the `next_cursor` key as part of your next query string until there's nothing left to search.

**Note:** We're using the free API, so we're operating under some limitations.  One of them being that you can only make 15 API calls in a 15 minute span to this portion of the API.  You can also only return up to 200 results per cursor, so this means you won't be able to completely search for everyone even if you set this up correctly.

That's fine, just do what you can under the circumstances.

**To Test:** To test your function, we'll run the following function calls:

 - `friends_of_friends(['ezraklein', 'tylercowen'])` -- should return 4 results if you do an API call that returns 200 results
 - `friends_of_friends(['ezraklein', 'tylercowen'], full_search=True)` -- should return 54 results if you do an API call that returns 200 results
 
**Hint:** Chances are you will exhaust your API limits quite easily in this function depending on who you search for.  Depending on how you have things set up, this could cause error messages to arise when things are otherwise fine.  Remember in class 3 when we were getting those weird dictionaries back because our limits were used up?  We won't hold you accountable for handling this inside your function, although it could make some things easier for your own testing.
       
Good luck!

In [9]:
#Function searches friends of two Twitter users and returns list of data objects for the friends in common
def friends_of_friends(names, keys=None, to_df=False, full_search=False):

    for item in names:  #loop through the names in the input list and create a list with the request outputs
        url =f'https://api.twitter.com/1.1/friends/list.json?screen_name={item}&count=200'
        list_users.append(requests.get(url,auth=auth).json()['users'])
        
#split the list into two lists, one for each user
    results1=list_users[0]
    results2=list_users[1]

#check the items in the two lists that have the same 'id' and append to a new list
    listfriends=[]
    listfriends=[item for item in results1 if item['id'] in [n['id'] for n in results2]]
    print(len(listfriends))

#create dictionary using keys if passed in the function arguments or all the keys available if not used as input
    results_dict={}
    if keys !=None:
        for i in keys:
            results_dict[i] = [user[i] for user in listfriends]
    else:
        for k,v in [(key,d[key]) for d in listfriends for key in d]:
            if k not in results_dict: 
                results_dict[k]=[v]
            else: 
                results_dict[k].append(v)
#build a table (pandas data frame from the dictionary) if to_df is True
    if to_df:
        import pandas as pd
        friendsdf = pd.DataFrame(results_dict)
        return friendsdf
    else:
        return results1

In [12]:
#Function searches friends of two Twitter users and returns list of data objects for the friends in common
def friends(user, keys=None, to_df=False, full_search=False): 
    url =f'https://api.twitter.com/1.1/friends/list.json?screen_name={user}'
    results=requests.get(url,auth=auth).json()
    nextcursor=results['next_cursor']
   # print(nextcursor)
    
    if full_search:
        list_results=[]
        i=1
        for i in range(1,5):
        #while nextcursor !=0:
            url =f'https://api.twitter.com/1.1/friends/list.json?screen_name={user}&cursor={nextcursor}'
            results_int=requests.get(url,auth=auth).json()
            #list_results.append(requests.get(url,auth=auth).json())
            nextcursor=results_int['next_cursor']
            print(nextcursor)
            i+=1
        return results_int
    else:
        return results
    #return results


    

In [17]:
friends('MariahCarey', full_search=True)

KeyError: 'next_cursor'

In [7]:
def friends(user, keys=None, to_df=False, full_search=False): 
    url =f'https://api.twitter.com/1.1/friends/list.json?screen_name={user}'
    results=requests.get(url,auth=auth).json()
    nextcursor=results['next_cursor']
    print(nextcursor)
    
    if full_search:
        list_results=[]
       # i=1
        #for i in range(1,5):
        while nextcursor !=0:
            url =f'https://api.twitter.com/1.1/friends/list.json?screen_name={user}&cursor={nextcursor}'
            results_int=requests.get(url,auth=auth).json()
            list_results.append(results_int)
            nextcursor=results_int['next_cursor']
            print(nextcursor)
         
        return list_results
    else:
        return results

In [8]:
friends('MariahCarey', full_search=True)

1644509744268087944
1636070215752200670
1621056923412275992
1613951663879374420
1518224439813049412
1439653401947108330
1364006135987682268
1306591519310498875


KeyError: 'next_cursor'