# `osometweet` crash course

- Author: Matthew R. DeVerna (🐥= `@mdeverna2`)
- Date: September 3rd, 2021

---

## Important Stuff

- [**GitHub source code**](https://github.com/osome-iu/osometweet)
- [**GitHub wiki**](https://github.com/osome-iu/osometweet/wiki)
- [**Example code**](https://github.com/osome-iu/osometweet/tree/master/examples)

---

## Crash Course Contents

#### General

- [Installation](#installation)
- [Quick start](#quick-start)
- [Authorization](#authorization)
  - [OAuth1a (user-context)](#oauth1a)
  - [OAuth2 (app-context aka w. bearer token)](#oauth2)

#### [Endpoints](#endpoints)

- [Tweet Lookup](#tweet-lookup)
- [User Lookup](#user-lookup)
  - [With user IDs](#with-ids)
  - [With usernames](#with-usernames)
- [Timelines](#timelines)
  - [User](#user-timeline)
  - [Mentions](#mentions-timeline)
  - [Specify the number of tweets returned](#num-tweets-returned)
  - [Pagination](#pagination)
- [Follows](#follows)
  - [Following](#following)
  - [Followers](#followers)
  - [Specify the number of accounts returned](#num-results-returned)
  - [Pagination](#pagination2)
- [Streaming](#streaming)
  - [Filtered streaming](#filtered-streaming)
      - [Adding filter rules](#adding-filter-rules)
      - [Retrieving filter rules](#retrieving-filter-rules)
      - [Connecting to the filtered stream endpoint](#connecting-to-the-filtered-stream-endpoint)
      - [Deleting filter rules](#deleting-filter-rules)
  - [Sampled streaming](#sampled-streaming)
- [Search]()
  - [Recent search](#recent-search)
  - [Full archive search](#full-archive-search)

#### [Fields and Expansions](#fields-and-expansions)
- [Give me everything](#give-me-everything)
- [All from one field](#all-one-field)
- [Specific fields](#specific-fields)
- [More on fields](#more-on-fields)

#### [Utility Functions](#utility-functions)
- [`pause_until`](#pause-until)
- [`chunker`](#chunker)
- [`convert_date_to_iso`](#convert-date)

#### [Wrangle Functions](#wrangle-functions)
- [`flatten_dict`](#flatten-dict)
- [`get_dict_paths`](#get-dict-paths)
- [`get_dict_val`](#get-dict-val)


---


<a id=installation></a>

## Installation

In [1]:
# From your Jupyter notebook
!pip install osometweet

# From the command line
# pip install osometweet



<a id=quick-start></a>

## Quick start

In [2]:
import osometweet
import os

# Initialize the OSoMeTweet object
# bearer_token = "YOUR_TWITTER_BEARER_TOKEN"
bearer_token = os.environ.get("TWITTER_BEARER_TOKEN")

oauth2 = osometweet.OAuth2(bearer_token=bearer_token)
ot = osometweet.OsomeTweet(oauth2)

# Set some test IDs (these are Twitter's own accounts)
ids2find = ["2244994945", "6253282"]

# Call the function with these ids as input
response = ot.user_lookup_ids(user_ids=ids2find)
print(response["data"])


[{'id': '2244994945', 'name': 'Twitter Dev', 'username': 'TwitterDev'}, {'id': '6253282', 'name': 'Twitter API', 'username': 'TwitterAPI'}]


<a id=authorization></a>

## Authorization

<a id=oauth1a></a>

### OAuth1a (user-context)

In [3]:
import osometweet

api_key = os.environ.get("TWITTER_API_KEY")
api_key_secret = os.environ.get("TWITTER_API_KEY_SECRET")
access_token = os.environ.get("TWITTER_ACCESS_TOKEN")
access_token_secret = os.environ.get("TWITTER_ACCESS_TOKEN_SECRET")

oauth1a = osometweet.OAuth1a(
    api_key=api_key,
    api_key_secret=api_key_secret,
    access_token=access_token,
    access_token_secret=access_token_secret
)
oauth1a

<osometweet.oauth.OAuth1a at 0x7fc1e7497d10>

<a id=oauth2></a>

### OAuth2 (app-context aka w. bearer token)

In [4]:
import osometweet

bearer_token = os.environ.get("TWITTER_BEARER_TOKEN")
oauth2 = osometweet.OAuth2(
    bearer_token=bearer_token,
    manage_rate_limits=True
)
oauth2

<osometweet.oauth.OAuth2 at 0x7fc1e70f5910>

In [5]:
oauth2._manage_rate_limits

True

---
<a id=endpoints></a>
# Endpoints

<a id=tweet-lookup></a>

## Tweet Lookup

In [6]:
tweet_ids = ['1323314485705297926', '1328838299419627525']

# Fetch the tweets information
response = ot.tweet_lookup(tweet_ids)
print(response["data"])


[{'id': '1323314485705297926', 'text': 'breathe'}, {'id': '1328838299419627525', 'text': 'some of you hating...\n\nbut we see you Fleeting 🧐'}]


<a id=user-lookup></a>

## User Lookup

<a id=with-ids></a>

## with ids

In [7]:

# Set some test IDs (these are Twitter's own accounts)
ids2find = ["2244994945", "6253282"]

# Call the function with these ids
response = ot.user_lookup_ids(ids2find)
print(response["data"])


[{'id': '2244994945', 'name': 'Twitter Dev', 'username': 'TwitterDev'}, {'id': '6253282', 'name': 'Twitter API', 'username': 'TwitterAPI'}]


<a id=with-usernames></a>

## with usernames

In [8]:
# Set some test IDs (these are Twitter's own accounts)
usernames2find = ["TwitterDev", "TwitterAPI"]

# Call the function with these ids
response = ot.user_lookup_usernames(usernames2find)
print(response["data"])


[{'id': '2244994945', 'name': 'Twitter Dev', 'username': 'TwitterDev'}, {'id': '6253282', 'name': 'Twitter API', 'username': 'TwitterAPI'}]


<a id=timelines></a>

## Timelines

<a id=user-timeline></a>

### User

We call the function to get `@jack`'s (jack dorsey's) 10 most recent tweets

The endpoint only supports user id, so we pass the id of @jack to the method

In [9]:
response = ot.get_tweet_timeline('12')
response["data"]

[{'id': '1432137059666378755',
  'text': 'Grateful for @elonmusk &amp; @SpaceX ❤️ https://t.co/1s4TqV3IdH'},
 {'id': '1431636712078336002', 'text': '@alee Nah'},
 {'id': '1431460418086752257', 'text': 'https://t.co/khIRWPZ9SC'},
 {'id': '1431378268167577608', 'text': '@bisq_network Def! Cc:@brockm'},
 {'id': '1431377752549171209',
  'text': '@evankaloudis I love bisq! Many lessons to learn from it.'},
 {'id': '1431377414362386437', 'text': '@mdudas Thanks, Mike'},
 {'id': '1431337475344109571', 'text': '@edgett 🙏🏼🙏🏼🌅'},
 {'id': '1431323561982054401', 'text': '@deppilf This makes Square better'},
 {'id': '1431321683856605192', 'text': '* @TBD54566975'},
 {'id': '1431320761474617344',
  'text': 'We’ve determined @TDB54566975’s direction: help us build an open platform to create a decentralized exchange for #Bitcoin https://t.co/jHYWHy1qmu'}]

<a id=mentions-timeline></a>

### Mentions

In [10]:
response = ot.get_mentions_timeline('12')
response["data"]

[{'id': '1432400646863392768',
  'text': 'Super cool. I have #4, and like #14 and #17\n\ntz1c4dhuL6s3mF8i8h8Pw9UwjmHQmbJZFnMb\n\n#Tezos #HEN\n\n@BLM_SayHerName @jack https://t.co/sroIvZOTni'},
 {'id': '1432400563061215232',
  'text': '@TheDragonDoge @elonmusk @SpaceX @Space_Station @1goonrich @TheLondonCrypto @ItsDogeCoin @dogecoin @khloe @DaCryptoMonkey @IcedKnife @jack @NASA @rachelvmoon @sampepper @TheCryptoLove @JRNYcrypto @AltcoinDailyio @girlgone_crypto I believe this is a faithful project.The projector has a lot of attractions so hopefully the project will be better in the future and will be the best.\n\n@RKHrithik2\n@Sunita05450283\n@anil87822254\n\n#DragonDoge #Airdrop #Crypto'},
 {'id': '1432400556425945100',
  'text': '@Dr_ghost21 @24bvll @jack قفل حسابه قال كلمه محظوره 👆🏻'},
 {'id': '1432400541854830593',
  'text': '@JohnPeniel55510 @davetroy @BrianCrockett3 @gpduf @CyrusAParsa1 @MichaelSalla @CIA @kayvz Then we have this 1️⃣ @kayvz 🤔™️  👽\n\nCan you look into that #BUDDY .

<a id=num-tweets-returned></a>

### Specifying the number of tweets returned

Often we need much more data than Twitter returns in one request. We can request up to 100 tweets at a time using the `max_results` parameter. This has large implications with respect to query limits (i.e. how many tweets you can get with the same number of requests). Here is an example:

Call the function to get `jack`'s 100 most recent followers

In [11]:
response = ot.get_tweet_timeline('12', max_results=100)

In [12]:
print(f"Now we have {len(response['data'])} tweets.")
print("~~~~~~~~~~~~~~~~~~~~~~")
response["data"]

Now we have 100 tweets.
~~~~~~~~~~~~~~~~~~~~~~


[{'id': '1432137059666378755',
  'text': 'Grateful for @elonmusk &amp; @SpaceX ❤️ https://t.co/1s4TqV3IdH'},
 {'id': '1431636712078336002', 'text': '@alee Nah'},
 {'id': '1431460418086752257', 'text': 'https://t.co/khIRWPZ9SC'},
 {'id': '1431378268167577608', 'text': '@bisq_network Def! Cc:@brockm'},
 {'id': '1431377752549171209',
  'text': '@evankaloudis I love bisq! Many lessons to learn from it.'},
 {'id': '1431377414362386437', 'text': '@mdudas Thanks, Mike'},
 {'id': '1431337475344109571', 'text': '@edgett 🙏🏼🙏🏼🌅'},
 {'id': '1431323561982054401', 'text': '@deppilf This makes Square better'},
 {'id': '1431321683856605192', 'text': '* @TBD54566975'},
 {'id': '1431320761474617344',
  'text': 'We’ve determined @TDB54566975’s direction: help us build an open platform to create a decentralized exchange for #Bitcoin https://t.co/jHYWHy1qmu'},
 {'id': '1431089692070588419', 'text': '◽️◼️◽️'},
 {'id': '1431086396849197058',
  'text': 'RT @babykeem: 🍿 https://t.co/uPbzhPhooJ'},
 {'id': '1431

<a id=pagination></a>

### Pagination

For each user ID, Twitter allows you to request up to 3,200 of the most recent tweets, and up to 800 of the most recent tweets mentioning a user. Since you can only request (at most) 100 tweets at a time, you will need to utilize the pagination_token returned in the meta-data of the response. For example, to get the 200 most recent tweets you can do the following...

In [13]:
response = ot.get_tweet_timeline('12', max_results=100)
response.keys()

dict_keys(['data', 'meta'])

In [14]:
response["meta"]

{'oldest_id': '1425610297465786369',
 'newest_id': '1432137059666378755',
 'result_count': 100,
 'next_token': '7140dibdnow9c7btw3z1m9c5sqjr2wvey0fxr9t3yxhrt'}

In [15]:
response_2 = ot.get_tweet_timeline(
    '12',
    pagination_token=response['meta']['next_token'],
    max_results = 100
)
response_2["meta"]

{'oldest_id': '1418288863710171139',
 'newest_id': '1425610015155572737',
 'result_count': 100,
 'next_token': '7140dibdnow9c7btw3z17a37gn03cu3x2pzwbhys44qih',
 'previous_token': '77qpymm88g5h9vqkluod2ouijqooqgxwahejqcwaoq9mf'}

In [16]:
response_2

{'data': [{'id': '1425610015155572737',
   'text': '@sodadecounty @DocDre We started wack'},
  {'id': '1425607896113496069',
   'text': 'RT @PeterChawaga: Of all the insight surfaced in this report, I was most struck by how widespread use of Bitcoin already seems to be in Cub…'},
  {'id': '1425563489121230849', 'text': '@zackvoell Bitcoin fixes this'},
  {'id': '1425504341243449348',
   'text': 'RT @StewYorkCity: The energy is crazy! TAP IN! https://t.co/dgrxvJaKfk'},
  {'id': '1425466730873331714',
   'text': '@kim_yle @TwitterSpaces @linukxxx ❤️'},
  {'id': '1425445801732546568',
   'text': '@sethforprivacy @optoutpod Eventually!'},
  {'id': '1425439030724923401',
   'text': "RT @LynAldenContact: I wrote an article about bitcoin's scaling pattern and energy usage:\nhttps://t.co/0m0RGXQ7E8"},
  {'id': '1425436742887383051',
   'text': '@sethforprivacy I’m supportive of your work, and here to learn.'},
  {'id': '1425253336677298176',
   'text': '@Niklauzi @jeetsidhu_ @MuunWallet No'},


<a id=follows></a>
## Follows

<a id=followers></a>
### Followers

In [17]:
# Call the function to get "jack"'s 100 most recent followers
# The endpoint only supports user id, we pass the id of @jack to the method
response = ot.get_followers('12')
response["data"]

[{'id': '1070079452753862656', 'name': 'Xixaux', 'username': 'Xixaux'},
 {'id': '921988909889855488',
  'name': 'Ryan Smithers',
  'username': 'RyanSmithers6'},
 {'id': '2200211286', 'name': '🆑', 'username': 'ConnorLeeming'},
 {'id': '1424994957677502465', 'name': 'Badar', 'username': 'Badar58075681'},
 {'id': '1359489776584335368',
  'name': 'dhanie.mischka',
  'username': 'DhanieMischka'},
 {'id': '1430001934782631938',
  'name': 'The Phone',
  'username': 'phoneanddeath'},
 {'id': '4898277603', 'name': 'Alfonso Glez.', 'username': 'alfon_glez'},
 {'id': '973007402', 'name': 'suciasabandija', 'username': 'suciasabandija'},
 {'id': '1320136737599987712',
  'name': 'Marko Jovičić',
  'username': 'JoviciMarko'},
 {'id': '1131001465881993217',
  'name': 'SafariSelva',
  'username': 'SafariSelvaOHI'},
 {'id': '976299708', 'name': 'Engin T.', 'username': 'EnginEngince'},
 {'id': '1432392432201748491',
  'name': 'Blair Colins',
  'username': 'blair_colins'},
 {'id': '1431188294386294785', '

<a id=following></a>
### Following

In [18]:
# Call the function to get the 100 most recent accounts that jack followed
response = ot.get_following('12')
response["data"]

[{'id': '410875968', 'name': 'Stephen DeLorme', 'username': 'StephenDeLorme'},
 {'id': '19433528', 'name': 'Aaliyah', 'username': 'AaliyahHaughton'},
 {'id': '1088489186821394432',
  'name': 'MONOGRAM',
  'username': 'monogramcompany'},
 {'id': '1193474189371543554',
  'name': '₿itcoin Q+A',
  'username': 'BitcoinQ_A'},
 {'id': '1237445215335723008',
  'name': 'FOUNDATION DEVICES',
  'username': 'FOUNDATIONdvcs'},
 {'id': '4131418094', 'name': 'Pablo Picasso', 'username': 'pablocubist'},
 {'id': '20535066', 'name': 'magicseaweed', 'username': 'magicseaweed'},
 {'id': '819808061556523009',
  'name': 'Jean-Michel Basquiat',
  'username': 'artistbasquiat'},
 {'id': '358333822', 'name': 'Loonardo Joe Vinci', 'username': 'wasthatawolf'},
 {'id': '14628341', 'name': 'Dmitry ButΞrin', 'username': 'BlockGeekDima'},
 {'id': '62079598', 'name': 'SADE', 'username': 'SadeAyodele'},
 {'id': '997471671723483136',
  'name': 'Whit Gibbs 🧭',
  'username': 'BitcoinBroski'},
 {'id': '255151550', 'name': 

<a id=num-results-returned></a>
### Specifiy the number of results

Up to 1000!

In [19]:
# Call the function to get "jack"'s 1000 most recent followers
response = ot.get_followers('12', max_results=1000)

In [20]:
response["meta"]

{'result_count': 1000, 'next_token': 'TETQSN75D2SHEZZZ'}

In [21]:
print(f"Now we have {len(response['data'])} accounts.")
print("~~~~~~~~~~~~~~~~~~~~~~")
response["data"]

Now we have 1000 accounts.
~~~~~~~~~~~~~~~~~~~~~~


[{'id': '1070079452753862656', 'name': 'Xixaux', 'username': 'Xixaux'},
 {'id': '921988909889855488',
  'name': 'Ryan Smithers',
  'username': 'RyanSmithers6'},
 {'id': '2200211286', 'name': '🆑', 'username': 'ConnorLeeming'},
 {'id': '1424994957677502465', 'name': 'Badar', 'username': 'Badar58075681'},
 {'id': '1359489776584335368',
  'name': 'dhanie.mischka',
  'username': 'DhanieMischka'},
 {'id': '1430001934782631938',
  'name': 'The Phone',
  'username': 'phoneanddeath'},
 {'id': '4898277603', 'name': 'Alfonso Glez.', 'username': 'alfon_glez'},
 {'id': '973007402', 'name': 'suciasabandija', 'username': 'suciasabandija'},
 {'id': '1320136737599987712',
  'name': 'Marko Jovičić',
  'username': 'JoviciMarko'},
 {'id': '1131001465881993217',
  'name': 'SafariSelva',
  'username': 'SafariSelvaOHI'},
 {'id': '976299708', 'name': 'Engin T.', 'username': 'EnginEngince'},
 {'id': '1432392432201748491',
  'name': 'Blair Colins',
  'username': 'blair_colins'},
 {'id': '1431188294386294785', '

<a id=pagination2></a>
### Pagination

In [22]:
# Call the function to get "jack"'s 1,000 most recent followers
response = ot.get_followers('12', max_results = 1000)
response.keys()

dict_keys(['data', 'meta'])

In [23]:
response["meta"]

{'result_count': 1000, 'next_token': 'TETQSN75D2SHEZZZ'}

In [24]:
# Call the function again to get another 1,000 followers:
response_2 = ot.get_followers(
    '12',
    pagination_token=response['meta']['next_token'], 
    max_results = 1000
)
response_2["meta"]

{'result_count': 1000,
 'next_token': 'QVFNO5VQ9QSHEZZZ',
 'previous_token': 'OKORND13IT3EGZZZ'}

In [25]:
help(ot.user_lookup_ids)

Help on method user_lookup_ids in module osometweet.api:

user_lookup_ids(user_ids: Union[list, tuple], *, everything: bool = False, fields: osometweet.fields.ObjectFields = None, expansions: osometweet.expansions.UserExpansions = None) -> dict method of osometweet.api.OsomeTweet instance
    Looks-up user account information using unique user account id numbers.
    User fields included by default match the default parameters returned
    by Twitter.
    
    Ref: https://developer.twitter.com/en/docs/twitter-api/users/lookup/api-reference/get-users
    
    Parameters:
    ----------
    - user_ids (list, tuple) - unique user ids to include in query (max
        100)
    - everything: (bool) - if True, return all fields and expansions.
        (default = False)
    - user_fields (list, tuple) - the user fields included in returned
        data. (Default = "id", "name", "username")
    - fields: (ObjectFields) - additional fields to return. (default =
        None)
    - expansions: (

---
<a id=streaming></a>
## Streaming

Twitter offers two different **streaming endpoints** to gather tweets in real-time:
1. [Filtered stream](https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/introduction) : The filtered stream endpoint enables developers to filter the real-time stream of public Tweets. 
    * There are also filtered stream endpoints that enable you to create and manage matching rules, and apply those rules to filter a stream of real-time Tweets that will return matching public Tweets. For example, you can request all tweets which include the word "politics" or some other string.
2. [Sampled stream](https://developer.twitter.com/en/docs/twitter-api/tweets/sampled-stream/introduction) : The sampled stream endpoint delivers a roughly 1% random sample of publicly available Tweets in real-time.

> Note: The streaming endpoints cannot be used with the Rate Limit Manager tool. Thus, during authorization the `manage_rate_limits` parameter must be set to `False`. See [Adding filter rules](#adding-filter-rules) for an example.

### Contents
- [Filtered streaming](#filtered-streaming)
  - [Adding filter rules](#adding-filter-rules)
  - [Retrieving filter rules](#retrieving-filter-rules)
  - [Connecting to the filtered stream endpoint](#connecting-to-the-filtered-stream-endpoint)
  - [Deleting filter rules](#deleting-filter-rules)
- [Sampled streaming](#sampled-streaming)
    - [Connecting to the filtered stream endpoint](#connecting-to-the-filtered-stream-endpoint)

<a id=filtered-streaming></a>
## Filtered streaming

There are three different `osometweet` methods that will help you stream real-time filtered public tweets.

|Type|`osometweet` method| Purpose | Twitter endpoint|
|----|-------------------|---------|-----------------|
|Streaming|`filtered_stream`|Connect to the stream| `GET /2/tweets/search/stream` |
|Management|`set_filtered_stream_rule`|Add or delete rules from your stream| `POST /2/tweets/search/stream/rules` |
|Management|`get_filtered_stream_rule`|Retrieve your stream's rules| `GET /2/tweets/search/stream/rules` |

To utilize the `filtered_stream` endpoint, we must first understand how to manage the _matching rules_. Matching rules are the criteria we provide to Twitter to tell them what we want them to give us.

For example, if we wanted only tweets that contain specific keywords - for example, "coronavirus" or "indiana" - we would need to create matching rules that tells Twitter to do exactly that. Here is what that looks like.

<a id=adding-filter-rules></a>
### Adding filter rules

To add filter rules, we use the `set_filtered_stream_rule` method.

In [26]:
oauth2 = osometweet.OAuth2(
    bearer_token=bearer_token,
    manage_rate_limits=False    # <~~~ Must be set to False!!
)
ot = osometweet.OsomeTweet(oauth2)

# Add streaming rules
rules = [{"value": "coronavirus", "tag": "all coronavirus tweets"},
         {"value": "indiana", "tag": "all indiana tweets"}]
add_rules = {"add": rules}

response = ot.set_filtered_stream_rule(rules=add_rules) #<~~~ Where the magic happens!

print("API response from adding two rules:\n")
response

API response from adding two rules:



{'data': [{'value': 'coronavirus',
   'tag': 'all coronavirus tweets',
   'id': '1432400787280379906'},
  {'value': 'indiana',
   'tag': 'all indiana tweets',
   'id': '1432400787280379905'}],
 'meta': {'sent': '2021-08-30T17:52:16.466Z',
  'summary': {'created': 2, 'not_created': 0, 'valid': 2, 'invalid': 0}}}

#### Understand adding filter rules
We highly recommend you check out Twitter's [own documentation](https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/integrate/build-a-rule) on how to build a rule. Also, see [Building High Quality Filters](https://developer.twitter.com/en/docs/tutorials/building-high-quality-filters) for a more in depth review.

Nonetheless, we provide a basic explanation of how adding rules works and their structure to get you up and running.

Rules are added based on a list of dictionaries with the keys: `value` and `tag`. Each dictionary in that list makes up one rule where the keys represent the below...

- `value` : The matching criteria
    - Twitter returns tweets that match this value's input. See the links above to learn about the different ways to match tweets.
- `tag` : A label for the matching rule in that dictionary
    - This doesn't affect the actual tweets that are returned, however, if you have many rules, creating simple tags can be helpful should you want to find and delete specific rules (see [Deleting filter rules](#deleting-filter-rules) for more information on this).

So the endpoint takes in something like the below (which you can see we created above in the [Adding filter rules](#adding-filter-rules) section).

```python
{'add': [
    {'value': 'coronavirus', 'tag': 'all coronavirus tweets'},
    {'value': 'indiana', 'tag': 'all indiana tweets'}
]}
```

The top-level key `add` tells Twitter that we are adding rules and feeds the list as input of what to add.

<a id=retrieving-filter-rules></a>
### Retrieving filter rules

Now, if we wanted to check that the rules added during the [Adding filter rules](#adding-filter-rules) section are actually there, we can use the `get_filtered_stream_rule` method.

We can do this like so...

In [27]:
current_rules = ot.get_filtered_stream_rule()
print("API response when retrieving current rules:\n")
current_rules


API response when retrieving current rules:



{'data': [{'id': '1432400787280379905',
   'value': 'indiana',
   'tag': 'all indiana tweets'},
  {'id': '1432400787280379906',
   'value': 'coronavirus',
   'tag': 'all coronavirus tweets'}],
 'meta': {'sent': '2021-08-30T17:52:16.673Z'}}

We can see here, that our rules are included under the `data` key. The `value` and `tag` keys are included exactly as we passed them and each rule also includes a unique identifier key `id`.

> Note, these ids will be unique each time you create these rules - i.e., that is if you add the rule which matches "coronavirus", it will create a unique value for `id`. If you then delete all of your rules and recreate that exact same rule, the value for `id` will not be the same.

<a id=connecting-to-the-filtered-stream-endpoint></a>
### Connecting to the filtered stream endpoint

Now that we have successfully added some matching rules, and we are confident they are there, we can connect to the streaming endpoint and begin gathering tweets. Here is how we do that...

In [28]:
import json

# Returns a generator
stream = ot.filtered_stream()

# Because we have a generator, we iterate over each tweet
for tweet in stream.iter_lines():
    
    # Then, we read the json `tweet` object as a dictionary and select the `data`
    # Note: if it's not there for some reason, this line returns `None`...
    data = json.loads(tweet).get("data")

    # ... but if we do find data, we can then print each tweet
    if data:
        print(data)

{'id': '1432400768829693958', 'text': 'RT @SykesCharlie: “Dr. Mark McDonald of Los Angeles is among a fringe group of outspoken medical professionals who have pushed ivermectin a…'}
{'id': '1432400766342475780', 'text': 'RT @elsalvador: #DePaís Así la actividad en el #MegacentroDeVacunación del @HospitalSV, en el que cientos de personas acuden para aplicarse…'}
{'id': '1432400769253249031', 'text': 'RT @BernieSpofforth: This was always coming.\n\nThe unvaccinated to be locked down, discriminated against and treated as the criminally uncle…'}
{'id': '1432400766631825413', 'text': 'RT @WUTangKids: Duke for the win: \n\nFaculty and staff must show proof of vaccination by Oct. 1 or get a 7 day notice to pack up and hit the…'}
{'id': '1432400765449117697', 'text': 'You’re tellin me that instead of you bein the typical gunslinger, I could’ve shared a space prison with INDIANA JONES???'}
{'id': '1432400766233309184', 'text': '@ZeeNews Virus expert hota hai khud ka structure badlne mein..tho a

KeyboardInterrupt: 

<a id=deleting-filter-rules></a>
### Deleting filter rules

To delete filter rules, we use the `set_filtered_stream_rule` method again.

To delete rules, we need to provide a list of the `id`'s for each rule that we'd like to delete. So if we have a `current_rules` object that represents the above dictionary, we can collect all of the tweet ids into a list with the below line.

In [29]:
current_rules = ot.get_filtered_stream_rule()
current_rules

{'data': [{'id': '1432400787280379905',
   'value': 'indiana',
   'tag': 'all indiana tweets'},
  {'id': '1432400787280379906',
   'value': 'coronavirus',
   'tag': 'all coronavirus tweets'}],
 'meta': {'sent': '2021-08-30T17:52:28.658Z'}}

In [30]:
all_rule_ids = [rule["id"] for rule in current_rules["data"]]
all_rule_ids

['1432400787280379905', '1432400787280379906']

In [31]:
delete_rule = {'delete': {'ids':all_rule_ids}}
ot.set_filtered_stream_rule(rules=delete_rule)

{'meta': {'sent': '2021-08-30T17:52:30.800Z',
  'summary': {'deleted': 2, 'not_deleted': 0}}}

Notice that we needed to embed the list of ids inside of a dictionary prior to passing it to the method. Just like adding filter rules, the first key of this dictionary tells Twitter what action it should be doing - i.e., `delete` tells Twitter to remove rules, based on the list of `ids` provided.

<a id=sampled-streaming></a>
## Sampled streaming

We can access the sampled streaming endpoint with the `sampled_stream` method.

<a id=connecting-to-the-sampled-stream-endpoint></a>
### Connecting to the sampled stream endpoint

As this endpoint doesn't take any matching criteria and simply returns a general 1% sample, there is much less to think about and we can begin collecting tweets from the sampled stream in the following way...

In [32]:
# Returns a generator
stream = ot.sampled_stream()

# Because we have a generator, we iterate over each tweet
for tweet in stream.iter_lines():
    
    # Then, we read the json `tweet` object as a dictionary and select the `data`
    # Note: if it's not there for some reason, this line returns `None`...
    data = json.loads(tweet).get("data")

    # ... but if we do find data, we can then print each tweet
    if data:
        print(data)

{'id': '1432400840116129796', 'text': '@YuriyATL We didn’t.. half of us did. (52.3%) and of that 52% most are in blue leaning areas.. meaning most of the unvaccinated live in patches of territory together.. allowing the virus to spread and mutate. It’s been called “pandemic of the unvaccinated”. The work isn’t done.'}
{'id': '1432400840120315915', 'text': '@DummblondGaming I have explored the hub a lot and its really cool and has a ton of hidden secrets! 👏'}
{'id': '1432400844293558277', 'text': 'RT @gabahn: tranquiiila como um vulcao\n\n      se joga no vulcao\n\n            nao\n\n                 ninguém seguraaaa'}
{'id': '1432400840124469256', 'text': 'RT @thinkjxdn: BABYY https://t.co/7oFpxKi5SV'}
{'id': '1432400844293611525', 'text': '@tefilorran 🤬'}
{'id': '1432400844293562368', 'text': 'He’s just put Liam Gallagher live at Reading on with out me even asking… I’ve found the man I plan on marrying 🤞🏼😂😍'}
{'id': '1432400844322902018', 'text': 'RT @BANGTANHIIT: [📊] - "Butter" vend

KeyboardInterrupt: 

<a id=search></a>
## Search

<a id=recent-search></a>
### Recent search

In [33]:
ot.search(
    query="grumpy cat"
)

{'data': [{'id': '1432398602526097410',
   'text': 'RT @KhemBey913: So true 😂🤣😂 one of his favorite saying is don’t touch me 😂 old grumpy cat 😂🤣😂😼'},
  {'id': '1432396542992916488',
   'text': '@SlimMo23 @ArtReemi The grumpy cat that didn’t get fed quick enough is cuter 😂'},
  {'id': '1432392660229279746', 'text': '@M0NST3RC0CK Grumpy cat'},
  {'id': '1432380604864155652',
   'text': '@shii1690 Que bonita gata, se parece a Grumpy Cat😄'},
  {'id': '1432369126727634946',
   'text': '@HermitThrush7 @codiecrowley I also don\'t think most cats are jerks. some can be a bit grumpy, but I think with proper handling and love they can come out of their shells. but yeah you\'re 100% right, people will have one bad encounter with a cat and be like "nope, hate them."'},
  {'id': '1432362836634841095',
   'text': '@bluebellyears @MailOnline Hmm 🤔 less impressive when you read it was built on her parents farm with help from her property developer father 🤨 \nI’m glad they’re happy, but it’s just rich 

<a id=full-archive-search></a>
### Full archive search

In [34]:
from osometweet.utils import convert_date_to_iso

start = convert_date_to_iso("2020-01-01")
end = convert_date_to_iso("2020-02-01")

print(start)

2020-01-01T00:00:00Z


In [35]:
response = ot.search(
    query="grumpy cat",
    start_time=start,
    end_time=end,
    full_archive_search=True,
    max_results=10,
    everything=True
)
response

{'data': [{'author_id': '1216520433815691265',
   'lang': 'en',
   'text': '@RealGrumpyCat \nRIP Grumpy Cat',
   'possibly_sensitive': False,
   'source': 'Twitter Web App',
   'created_at': '2020-01-31T23:50:49.000Z',
   'entities': {'mentions': [{'start': 0,
      'end': 14,
      'username': 'RealGrumpyCat',
      'id': '860237030'}]},
   'id': '1223393226805125120',
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'reply_settings': 'everyone',
   'in_reply_to_user_id': '860237030',
   'conversation_id': '1223393226805125120'},
  {'author_id': '613642580',
   'referenced_tweets': [{'type': 'retweeted', 'id': '1223272233977765889'}],
   'lang': 'en',
   'text': 'RT @ABC: A grumpy cat dubbed the "World\'s Worst" has found herself a happy family. https://t.co/bOc5WvSzmC',
   'possibly_sensitive': False,
   'source': 'Twitter for Android',
   'created_at': '2020-01-31T23:42:08.000Z',
   'context_annotations': [{'domain': {'id

In [36]:
response.keys()

dict_keys(['data', 'includes', 'errors', 'meta'])

In [37]:
response["includes"]

{'users': [{'username': 'TimetravelerThe',
   'created_at': '2020-01-13T00:40:58.000Z',
   'description': '',
   'id': '1216520433815691265',
   'url': '',
   'name': 'The TimeTraveler',
   'profile_image_url': 'https://pbs.twimg.com/profile_images/1222854602808340482/b6u4eTd2_normal.jpg',
   'verified': False,
   'protected': False,
   'public_metrics': {'followers_count': 3,
    'following_count': 51,
    'tweet_count': 93,
    'listed_count': 0}},
  {'username': 'RealGrumpyCat',
   'created_at': '2012-10-03T19:42:41.000Z',
   'entities': {'url': {'urls': [{'start': 0,
       'end': 23,
       'url': 'https://t.co/GSwxT4CXMs',
       'expanded_url': 'https://www.grumpycats.com/shop',
       'display_url': 'grumpycats.com/shop'}]},
    'description': {'urls': [{'start': 78,
       'end': 101,
       'url': 'https://t.co/tHU9DEs4TV',
       'expanded_url': 'http://Instagram.com/realgrumpycat',
       'display_url': 'Instagram.com/realgrumpycat'},
      {'start': 102,
       'end': 125,

In [38]:
response["errors"]

[{'parameter': 'entities.mentions.username',
  'resource_id': 'realDonaldTrump',
  'value': 'realDonaldTrump',
  'detail': 'User has been suspended: [realDonaldTrump].',
  'title': 'Forbidden',
  'resource_type': 'user',
  'type': 'https://api.twitter.com/2/problems/resource-not-found'},
 {'value': '1223378807383564288',
  'detail': 'Could not find tweet with referenced_tweets.id: [1223378807383564288].',
  'title': 'Not Found Error',
  'resource_type': 'tweet',
  'parameter': 'referenced_tweets.id',
  'resource_id': '1223378807383564288',
  'type': 'https://api.twitter.com/2/problems/resource-not-found'}]

In [39]:
response["meta"]

{'newest_id': '1223393226805125120',
 'oldest_id': '1223373908679639045',
 'result_count': 10,
 'next_token': 'b26v89c19zqg8o3fo71f2o2fqyti4qnidrms9qaahs2nx'}

<a id=fields-and-expansions></a>
## Fields and expansions
<a id=give-me-everything></a>
### Give me everything

If you want all the data fields that Twitter has to offer, follow the example below.

In [40]:
# make request
ot.tweet_lookup('1348419350370398209', everything=True)

{'data': [{'text': 'Treaties, like alliances, can outlast their sell-by-date. Always have to keep evaluating their usefulness. https://t.co/up8Qtj6gn8',
   'reply_settings': 'everyone',
   'conversation_id': '1348419350370398209',
   'entities': {'urls': [{'start': 107,
      'end': 130,
      'url': 'https://t.co/up8Qtj6gn8',
      'expanded_url': 'https://twitter.com/SecPompeo/status/1348419350370398209/photo/1',
      'display_url': 'pic.twitter.com/up8Qtj6gn8'}]},
   'created_at': '2021-01-11T00:00:00.000Z',
   'context_annotations': [{'domain': {'id': '10',
      'name': 'Person',
      'description': 'Named people in the world like Nelson Mandela'},
     'entity': {'id': '936263589509263360',
      'name': 'Mike Pompeo',
      'description': 'US Secretary of State, Mike Pompeo'}},
    {'domain': {'id': '35',
      'name': 'Politician',
      'description': 'Politicians in the world, like Joe Biden'},
     'entity': {'id': '936263589509263360',
      'name': 'Mike Pompeo',
      '

`everything=True` works for all `osometweet` endpoints

In [41]:
ot.user_lookup_usernames(["mdeverna2"], everything=True)

{'data': [{'username': 'mdeverna2',
   'name': 'Matthew DeVerna',
   'pinned_tweet_id': '1364255699996471298',
   'url': 'https://t.co/1QYmEi2pIM',
   'profile_image_url': 'https://pbs.twimg.com/profile_images/1314948204916617216/z9HvE_xt_normal.jpg',
   'public_metrics': {'followers_count': 81,
    'following_count': 312,
    'tweet_count': 528,
    'listed_count': 2},
   'description': 'Matt DeVerna. Ph.D. student - Indiana University Bloomington Informatics, Complex Networks and Systems. OSoMe Knight Fellow studying misinformation.',
   'created_at': '2020-10-04T20:24:15.000Z',
   'verified': False,
   'entities': {'url': {'urls': [{'start': 0,
       'end': 23,
       'url': 'https://t.co/1QYmEi2pIM',
       'expanded_url': 'http://matthewdeverna.com',
       'display_url': 'matthewdeverna.com'}]}},
   'protected': False,
   'id': '1312850357555539972'}],
 'includes': {'tweets': [{'possibly_sensitive': False,
    'lang': 'en',
    'text': 'Thrilled to announce the official launch o

<a id=all-one-field></a>
### Get all from a specific field
You can also retrieve all elements from specific object fields. The available object fields are:
- `UserFields`
- `TweetFields`
- `MediaFields`
- `PlaceFields`
- `PollFields`

In [42]:
import osometweet.fields as o_fields

all_user_fields = o_fields.UserFields(everything=True)
print(all_user_fields)

ot.user_lookup_usernames(['mdeverna2'], fields=all_user_fields)

id,name,username,created_at,description,entities,location,pinned_tweet_id,profile_image_url,protected,public_metrics,url,verified,withheld


{'data': [{'username': 'mdeverna2',
   'created_at': '2020-10-04T20:24:15.000Z',
   'entities': {'url': {'urls': [{'start': 0,
       'end': 23,
       'url': 'https://t.co/1QYmEi2pIM',
       'expanded_url': 'http://matthewdeverna.com',
       'display_url': 'matthewdeverna.com'}]}},
   'description': 'Matt DeVerna. Ph.D. student - Indiana University Bloomington Informatics, Complex Networks and Systems. OSoMe Knight Fellow studying misinformation.',
   'id': '1312850357555539972',
   'url': 'https://t.co/1QYmEi2pIM',
   'name': 'Matthew DeVerna',
   'profile_image_url': 'https://pbs.twimg.com/profile_images/1314948204916617216/z9HvE_xt_normal.jpg',
   'verified': False,
   'pinned_tweet_id': '1364255699996471298',
   'protected': False,
   'public_metrics': {'followers_count': 81,
    'following_count': 312,
    'tweet_count': 528,
    'listed_count': 2}}]}

In [43]:
all_tweet_fields = o_fields.TweetFields(everything=True)
print(all_tweet_fields)

ot.tweet_lookup('1348419350370398209', fields=all_tweet_fields)

id,text,attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,in_reply_to_user_id,lang,possibly_sensitive,public_metrics,referenced_tweets,reply_settings,source,withheld


{'data': [{'attachments': {'media_keys': ['3_1348399526705586183']},
   'possibly_sensitive': False,
   'context_annotations': [{'domain': {'id': '10',
      'name': 'Person',
      'description': 'Named people in the world like Nelson Mandela'},
     'entity': {'id': '936263589509263360',
      'name': 'Mike Pompeo',
      'description': 'US Secretary of State, Mike Pompeo'}},
    {'domain': {'id': '35',
      'name': 'Politician',
      'description': 'Politicians in the world, like Joe Biden'},
     'entity': {'id': '936263589509263360',
      'name': 'Mike Pompeo',
      'description': 'US Secretary of State, Mike Pompeo'}}],
   'id': '1348419350370398209',
   'text': 'Treaties, like alliances, can outlast their sell-by-date. Always have to keep evaluating their usefulness. https://t.co/up8Qtj6gn8',
   'entities': {'urls': [{'start': 107,
      'end': 130,
      'url': 'https://t.co/up8Qtj6gn8',
      'expanded_url': 'https://twitter.com/SecPompeo/status/1348419350370398209/photo/1

<a id=specific-fields></a>
### Include specific fields and expansions

`OSoMeTweet` provides the flexibility to specify exactly what the API should return.

Let us use the `tweet_lookup` endpoint as an example.

Suppose we are interested in a tweet with the unique tweet ID number, `1212092628029698048`. 

In addition to the default tweet data, we also want to know:
1. When it was created (captured in the `created_at` field)
2. How popular it was (captured in the `public_metrics` field with information like retweet counts, etc.)
3. The author of the tweet (so we will need to _expand_ the `author_id` field)
4. When the author created their account (so we also need to request the `created_at` field as a user field)

To retrieve all of this information, we simply specify these specific tweet and user fields in our query.

In [44]:
import osometweet.fields as o_fields
import osometweet.expansions as o_expansions

# Initialize the fields object
tweet_fields = o_fields.TweetFields()

# Specify the tweet fields you need
tweet_fields.fields = ['public_metrics', 'created_at']

# Initialize the expansion object
expansions = o_expansions.TweetExpansions()

# Specify the expansions you need
expansions.expansions = ["author_id"]

# Initialize the user fields object
user_fields = o_fields.UserFields()

# Specify the fields you need
user_fields.fields = ['created_at']

# make request
ot.tweet_lookup(
    tids = ['1212092628029698048'],
    fields = tweet_fields+user_fields,
    expansions=expansions
)

{'data': [{'text': 'We believe the best future version of our API will come from building it with YOU. Here’s to another great year with everyone who builds on the Twitter platform. We can’t wait to continue working with you in the new year. https://t.co/yvxdK6aOo2',
   'created_at': '2019-12-31T19:26:16.000Z',
   'id': '1212092628029698048',
   'public_metrics': {'retweet_count': 7,
    'reply_count': 3,
    'like_count': 39,
    'quote_count': 1},
   'author_id': '2244994945'}],
 'includes': {'users': [{'id': '2244994945',
    'name': 'Twitter Dev',
    'created_at': '2013-12-14T04:35:55.000Z',
    'username': 'TwitterDev'}]}}

<a id=more-on-fields></a>
### More on fields

Twitter [supports](https://developer.twitter.com/en/docs/twitter-api/fields) fields for `user`, `tweet`, `media`, `poll`, and `place`.
You can use `UserFields`, `TweetFields`, `MediaFields`, `PollFields`, and `PlaceFields` classes to handle them, respectively.
They only contain the default fields if not specified otherwise.

You can see what optional fields are available by

In [45]:
import osometweet.fields as o_fields

tweet_fields = o_fields.TweetFields()
tweet_fields.optional_fields

['attachments',
 'author_id',
 'context_annotations',
 'conversation_id',
 'created_at',
 'entities',
 'geo',
 'in_reply_to_user_id',
 'lang',
 'possibly_sensitive',
 'public_metrics',
 'referenced_tweets',
 'reply_settings',
 'source',
 'withheld']

In [46]:
tweet_fields.default_fields

['id', 'text']

You can specify the fields by

In [47]:
tweet_fields.fields= ['public_metrics', 'created_at']
tweet_fields.fields

['created_at', 'public_metrics']

You can add different fields objects up to get an object that contains all the information, and pass it to the API endpoints

In [48]:
import osometweet.fields as o_fields

tweet_fields = o_fields.TweetFields()
tweet_fields.fields = ['public_metrics', 'created_at']

user_fields = o_fields.UserFields()
user_fields.fields = ['created_at']

sum_of_fields = tweet_fields + user_fields
# OR
# sum_of_fields = sum([tweet_fields, user_fields])

print(type(sum_of_fields))
sum_of_fields

<class 'osometweet.fields.ObjectFields'>


{'tweet.fields': 'created_at,public_metrics', 'user.fields': 'created_at'}

Note: We include the `user.fields` object here but it is not returned by Twitter because we do not include the `author_id` expansion. Always make sure to double-check your asking for the right information from Twitter!!

In [49]:
ot.tweet_lookup(
    tids = ['1212092628029698048'],
    fields = sum_of_fields
)

{'data': [{'text': 'We believe the best future version of our API will come from building it with YOU. Here’s to another great year with everyone who builds on the Twitter platform. We can’t wait to continue working with you in the new year. https://t.co/yvxdK6aOo2',
   'created_at': '2019-12-31T19:26:16.000Z',
   'public_metrics': {'retweet_count': 7,
    'reply_count': 3,
    'like_count': 39,
    'quote_count': 1},
   'id': '1212092628029698048'}]}

<a id=utility-functions></a>
## Utility Functions

We also include a few utility methods which will (hopefully) make working with the new Twitter API structure a bit easier.

First, you can import the utility methods into your environment with the following code...

<a id=pause-until></a>
#### `o_utils.pause_until`
Managing time is an important aspect of gathering data from Twitter and often you'd just like to wait some specified time. This is relatively easy with the `time` module, however, it is even easier with the `pause_until()` method. Simply input the time that you would like to pause your code until, and the method handles the rest. This method can take in a `datetime` object or a Unix epoch time-stamp. For example, if you'd like to wait ten seconds, you can simple do...

In [50]:
import osometweet.utils as o_utils
import datetime as datetime

# The below line of code takes the time at the current moment, converts it to an epoch time-stamp
# and then adds five seconds to it.
now_plus_5_with_epoch_timestamp = datetime.datetime.now().timestamp() + 5

print("timestamp:", now_plus_5_with_epoch_timestamp)

# Then we input that into the pause_until() method and your machine will
# sleep until that specific time, five seconds later
print("Time before call:",datetime.datetime.now())

o_utils.pause_until(now_plus_5_with_epoch_timestamp)

print("Time after call:",datetime.datetime.now())

timestamp: 1630346064.260082
Time before call: 2021-08-30 13:54:19.260507
Time after call: 2021-08-30 13:54:24.260306


If you'd like to do this with a `datetime`object, it looks like this...

In [51]:
import osometweet.utils as o_utils
import datetime as datetime

# The timedelta method takes input in the following way...
# timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)
now_plus_5_with_datetime_object = datetime.datetime.now() + datetime.timedelta(seconds=5)

print("datetime object:", now_plus_5_with_datetime_object)

print("Time before call:",datetime.datetime.now())

o_utils.pause_until(now_plus_5_with_datetime_object)

print("Time before call:",datetime.datetime.now())

datetime object: 2021-08-30 13:54:29.269534
Time before call: 2021-08-30 13:54:24.269817
Time before call: 2021-08-30 13:54:29.269784


<a id=chunker></a>
#### `o_utils.chunker`
Another reality of working with Twitter data is that you are only allowed to query Twitter with a maximum number of users/tweets/whatever per endpoint. To deal with this, we created the `o_utils.chunker` method which turns a list into a list of smaller lists where the length of those smaller lists are no longer than the user indicated size. For example...

In [52]:
from osometweet import utils as o_util
my_list = ["user1", "user2", "user3", "user4", "user5", "user6", "user7", "user8", "user9"]
chunked_list = o_util.chunker(seq=my_list, size=2)
print(chunked_list)

[['user1', 'user2'], ['user3', 'user4'], ['user5', 'user6'], ['user7', 'user8'], ['user9']]


<a id=convert-date></a>
#### `o_utils.convert_date_to_iso`

Some of the endpoints require specific time strings to specify where to (for example) search for different tweets.

We provide the `o_utils.convert_date_to_iso` method to make this easier...

In [53]:
from osometweet import utils as o_util

o_util.convert_date_to_iso("2020-1-1")

'2020-01-01T00:00:00Z'

Can also specify the `time_format` object if we want using any of the standard `datetime` formats.

In [54]:
o_util.convert_date_to_iso("2020", time_format="%Y")

'2020-01-01T00:00:00Z'

In [55]:
o_util.convert_date_to_iso("2020_1_1", time_format="%Y_%m_%d")

'2020-01-01T00:00:00Z'

<a id=wrangle-functions></a>
## Wrangle functions

**`osometweet.wrangle`** includes a handful of low-level data processing functions that we think could be useful when wrangling your Twitter data into something easier to analyze. The idea behind these functions was to create methods that you can easily adapt to your data processing pipeline, as opposed to creating our own that you must adopt.

Below we provide simple examples of how each function works.

### Contents
- `flatten_dict`
- `flatten_dict` and Twitter data
- `get_dict_paths`
- `get_dict_val`

### Import
We can import these functions via...

In [56]:
from osometweet.wrangle import get_dict_paths, get_dict_val, flatten_dict

<a id=flatten-dict></a>
### `flatten_dict`

This function takes a nested dictionary and "flattens" it so that the keys of each nested dictionary are concatenated into a single string, and the value is the value at the end of that key path. This function can help you simplify the complexity of a nested dictionary (like Twitter's data objects) so it is easier to manage.

Let's see what this means.

In [57]:
# Create dictionary
dictionary = {
    "a" : 1,
    "b" : {
        "c" : 2,
        "d" : 5
    },
    "e" : {
        "f" : 4,
        "g" : 3
    },
    "h" : 3
}

In [58]:
dictionary

{'a': 1, 'b': {'c': 2, 'd': 5}, 'e': {'f': 4, 'g': 3}, 'h': 3}

#### 1. Using function as is

In [59]:
flat_dict = flatten_dict(dictionary)
flat_dict

{'a': 1, 'b.c': 2, 'b.d': 5, 'e.f': 4, 'e.g': 3, 'h': 3}

In [60]:
print(dictionary.keys())
print(flat_dict.keys())

dict_keys(['a', 'b', 'e', 'h'])
dict_keys(['a', 'b.c', 'b.d', 'e.f', 'e.g', 'h'])


#### 2. Changing `parent_key`
This function has an available parameter called `parent_key` which helps it work. Typically, we would recommend that you do not touch this, however, here is what tinkering with this will do - should you find some use for it. 😄 

In [61]:
# Parent key will add `parent_key` as a prefix to all keys
flatten_dict(dictionary, parent_key = "NEW")

{'NEW.a': 1,
 'NEW.b.c': 2,
 'NEW.b.d': 5,
 'NEW.e.f': 4,
 'NEW.e.g': 3,
 'NEW.h': 3}

#### 3. Changing `sep`
Another parameter, `sep`, allows you to control the string that will separate each level of the concatenated key path. As you saw above, the default is a period (i.e., '.'), however, it can be whatever you prefer.

In [62]:
# This string is what will separate key path strings
flatten_dict(dictionary, sep = "_")


{'a': 1, 'b_c': 2, 'b_d': 5, 'e_f': 4, 'e_g': 3, 'h': 3}

🚨🚨🚨🚨

### `flatten_dict` and Twitter data

It is important to note that the `flatten_dict` function handles all nested _dictionaires_ but will stop when it reaches something other than a dictionary. What this means is for certain data objects which contain a _list_ as the value (e.g. urls and context_annotations), further processing will be needed.

To understand what this means in more detail, I've created a [walk-through](https://github.com/osome-iu/osometweet/wiki/Method:-Wrangle-Practical-Walk-through-(flatten_dict)) of one way you might process a couple of tweets using this function while keeping the above in mind.

🚨🚨🚨🚨

In [63]:
response = ot.user_lookup_usernames(["mdeverna2"], everything=True)

In [64]:
response["data"][0]

{'url': 'https://t.co/1QYmEi2pIM',
 'created_at': '2020-10-04T20:24:15.000Z',
 'protected': False,
 'description': 'Matt DeVerna. Ph.D. student - Indiana University Bloomington Informatics, Complex Networks and Systems. OSoMe Knight Fellow studying misinformation.',
 'profile_image_url': 'https://pbs.twimg.com/profile_images/1314948204916617216/z9HvE_xt_normal.jpg',
 'verified': False,
 'id': '1312850357555539972',
 'pinned_tweet_id': '1364255699996471298',
 'public_metrics': {'followers_count': 81,
  'following_count': 312,
  'tweet_count': 528,
  'listed_count': 2},
 'username': 'mdeverna2',
 'entities': {'url': {'urls': [{'start': 0,
     'end': 23,
     'url': 'https://t.co/1QYmEi2pIM',
     'expanded_url': 'http://matthewdeverna.com',
     'display_url': 'matthewdeverna.com'}]}},
 'name': 'Matthew DeVerna'}

In [65]:
flatten_dict(response["data"][0])

{'url': 'https://t.co/1QYmEi2pIM',
 'created_at': '2020-10-04T20:24:15.000Z',
 'protected': False,
 'description': 'Matt DeVerna. Ph.D. student - Indiana University Bloomington Informatics, Complex Networks and Systems. OSoMe Knight Fellow studying misinformation.',
 'profile_image_url': 'https://pbs.twimg.com/profile_images/1314948204916617216/z9HvE_xt_normal.jpg',
 'verified': False,
 'id': '1312850357555539972',
 'pinned_tweet_id': '1364255699996471298',
 'public_metrics.followers_count': 81,
 'public_metrics.following_count': 312,
 'public_metrics.tweet_count': 528,
 'public_metrics.listed_count': 2,
 'username': 'mdeverna2',
 'entities.url.urls': [{'start': 0,
   'end': 23,
   'url': 'https://t.co/1QYmEi2pIM',
   'expanded_url': 'http://matthewdeverna.com',
   'display_url': 'matthewdeverna.com'}],
 'name': 'Matthew DeVerna'}

<a id=get-dict-val></a>
### `get_dict_val`

This function returns a dictionary value at the end of a key path - provided as a `list`, like those returned by `get_dict_paths`. 

Here is what this function looks like in practice.

In [66]:
# Create dictionary
dictionary = {
    "a" : 1,
    "b" : {
        "c" : 2,
        "d" : 5
    },
    "e" : {
        "f" : 4,
        "g" : 3
    },
    "h" : 3
}

# Create key_list
key_list = ['e', 'f']

# Execute function
get_dict_val(dictionary, key_list)

4

#### 2. When the input `key_path` doesn't exist
It is important to know that this function does not break should you be asking it to return a value at the end of a key path that doesn't exist. Instead, it will return `None`.

In [67]:
# Create key_list
key_list = ['b', 'k']

# Execute function
value = get_dict_val(dictionary, key_list)

# Returns NoneType because the provided path doesn't exist
type(value)


NoneType

<a id=get-dict-paths></a>
### `get_dict_paths`
This function returns a **generator** that iterates over all full key paths within `dictionary`. Because Twitter often returns only the data that is present for a specific data object (for example, certain fields/expansions (see [info](https://github.com/osome-iu/osometweet/wiki/Info:-Available-Fields-and-Expansions), [our methods](https://github.com/osome-iu/osometweet/wiki/Method:-Specifying-fields-and-expansions) for more details) will only be present within a data object if there is something to return for that field/expansion), this function can help you understand what your data object actually contains.

Here is a simple example...

In [68]:
# Create dictionary
dictionary = {
    "a" : 1,
    "b" : {
        "c" : 2,
        "d" : 5
    },
    "e" : {
        "f" : 4,
        "g" : 3
    },
    "h" : 3
}

# Call get_dict_paths
print(list(get_dict_paths(dictionary)))

[['a'], ['b', 'c'], ['b', 'd'], ['e', 'f'], ['e', 'g'], ['h']]


In [69]:
all_key_paths = list(get_dict_paths(dictionary))

In [70]:
bs_paths = [path for path in all_key_paths if "b" in path]
bs_paths

[['b', 'c'], ['b', 'd']]

In [71]:
for path in bs_paths:
    print(get_dict_val(dictionary, path))

2
5


In [72]:
tweet_dict_paths = list(get_dict_paths(response["data"][0]))

In [73]:
tweet_dict_paths

[['url'],
 ['created_at'],
 ['protected'],
 ['description'],
 ['profile_image_url'],
 ['verified'],
 ['id'],
 ['pinned_tweet_id'],
 ['public_metrics', 'followers_count'],
 ['public_metrics', 'following_count'],
 ['public_metrics', 'tweet_count'],
 ['public_metrics', 'listed_count'],
 ['username'],
 ['entities', 'url', 'urls'],
 ['name']]

In [74]:
pub_metric_paths = [path for path in tweet_dict_paths if "public_metrics" in path]
pub_metric_paths

[['public_metrics', 'followers_count'],
 ['public_metrics', 'following_count'],
 ['public_metrics', 'tweet_count'],
 ['public_metrics', 'listed_count']]

In [75]:
for path in pub_metric_paths:
    print(
        path[1],
        get_dict_val(response["data"][0], path)
    )

followers_count 81
following_count 312
tweet_count 528
listed_count 2


### Extracting error details

It can be very useful to take advantage of the information returned in the `errors` object by Twitter. The below `user_lookup_usernames` method call tells us that the account `@donaldjtrump` has been suspended.

In [76]:
ot.user_lookup_usernames(usernames=["donaldjtrump","mdeverna2"], everything=True)

{'data': [{'username': 'mdeverna2',
   'protected': False,
   'entities': {'url': {'urls': [{'start': 0,
       'end': 23,
       'url': 'https://t.co/1QYmEi2pIM',
       'expanded_url': 'http://matthewdeverna.com',
       'display_url': 'matthewdeverna.com'}]}},
   'pinned_tweet_id': '1364255699996471298',
   'description': 'Matt DeVerna. Ph.D. student - Indiana University Bloomington Informatics, Complex Networks and Systems. OSoMe Knight Fellow studying misinformation.',
   'created_at': '2020-10-04T20:24:15.000Z',
   'id': '1312850357555539972',
   'url': 'https://t.co/1QYmEi2pIM',
   'public_metrics': {'followers_count': 81,
    'following_count': 312,
    'tweet_count': 528,
    'listed_count': 2},
   'profile_image_url': 'https://pbs.twimg.com/profile_images/1314948204916617216/z9HvE_xt_normal.jpg',
   'verified': False,
   'name': 'Matthew DeVerna'}],
 'includes': {'tweets': [{'entities': {'urls': [{'start': 68,
       'end': 91,
       'url': 'https://t.co/Q7HqNjavx6',
       