# SOLUTIONS NOTEBOOK

## Exercises

01. Take a look at the following dictionaries.

rules_1 = [
    
    {"value": '("heat pump" OR "heat pumps") -is:retweet lang:en'},

    {"value": '("gas boiler" OR "gas boilers") -is:retweet lang:en'},

]

and 

rules_2 = [

    {"value": '("heat pump" OR "heat pumps" OR "gas boiler" OR "gas boilers") -is:retweet lang:en'},

]

Do we collect the exact same tweets from rules_1 and rules_2?

*Your answer:* We collect the exact same tweets, but we might collect them twice with rules_1 if they match both rules in rules_1. If we know that might be the case and we can have them all as one rule (if it does not surpass the maximum length allowed) then that's preferable.

02. Taking the rules defined below, collect data including:
- tweet fields: tweet id, tweet text, author id, tweet creation date and time, context annotations, entities, geo, public metrics, source
- user fields: user id, name, username, date and time user created the account, description, location, if user is verified or not, public metrics
- place fields: place id, country, country_code, country name, country full name, geo and place_type


*Helpful links:*
- [Tweet fields](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet)
- [User fields](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user)
- [Place fields](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/place)

In [None]:
from collect_and_process_search_data import *
import os
import pandas as pd

In [None]:
rules = [
{"value": '("heat pump" OR "heat pumps") -is:retweet lang:en', "tag":"exercise_2"},
]

In [None]:
query_parameters = {
    "tweet.fields": "id,text,author_id,created_at,context_annotations,entities,geo,public_metrics,source",
    "user.fields": "id,name,username,created_at,description,location,verified,public_metrics",
    "place.fields": "id,country,country_code,name,full_name,geo,place_type",
    "expansions": "author_id,geo.place_id",
    "max_results": 100,
}

In [None]:
bearer_token = os.environ.get("BEARER_TOKEN")

In [None]:
collect_and_process_twitter_data(bearer_token, rules, query_parameters)

In [None]:
tweets_ex2 = pd.read_pickle("tweets_exercise_2.pkl")
users_ex2 = pd.read_pickle("users_exercise_2.pkl")
places_ex2 = pd.read_pickle("places_exercise_2.pkl")

In [None]:
tweets_ex2

In [None]:
users_ex2

In [None]:
places_ex2

03. Using the same query parameters you defined above, change your rules to so that the data you collect does not contain heatpump nor heatpumps hashtags.

*Helpful links:*
- Take a look at the list of operators [here](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query#list)

In [None]:
#change this list to account for the hashtags
rules = [
{"value": '("heat pump" OR "heat pumps") -(#heatpump OR #heatpump) -is:retweet lang:en', "tag":"exercise_3"},
]

In [None]:
# copy your answer from previous exercise
query_parameters = {
    "tweet.fields": "id,text,author_id,created_at,context_annotations,entities,geo,public_metrics,source",
    "user.fields": "id,name,username,created_at,description,location,verified,public_metrics",
    "place.fields": "id,country,country_code,name,full_name,geo,place_type",
    "expansions": "author_id,geo.place_id",
    "max_results": 100,
}

In [None]:
collect_and_process_twitter_data(bearer_token, rules, query_parameters)

In [None]:
tweets_ex3 = pd.read_pickle("tweets_exercise_3.pkl")
users_ex3 = pd.read_pickle("users_exercise_3.pkl")
places_ex3 = pd.read_pickle("places_exercise_3.pkl")

04. Change **rules** and **query_parms** below to collect data from Twitter satisfying the following requirements:
- mentioning **ChatGPT** but *not mentioning* programming, refactoring or code;
- no retweets;
- written in english;
- from verified authors;
- posted between the 12th of January 2023 at 2pm (UK time) and 12th of January 2023 at 3pm (UK time);
- 100 results per call to the API;
- tweet fields, user fields and place fields as per exercise 2.


*Helpful links:*
- Take a look at the list of operators [here](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query#list)
- To know more about start and end times parameters [check this page](https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent)

In [None]:
rules = [
    {"value": '(chatgpt OR "chat-gpt") -(programming OR refactoring OR code) -is:retweet lang:en is:verified', "tag":"exercise_4"},
]

In [None]:
query_parameters = {
    "tweet.fields": "id,text,author_id,created_at,context_annotations,entities,geo,public_metrics,source",
    "user.fields": "id,name,username,created_at,description,location,verified,public_metrics",
    "place.fields": "id,country,country_code,name,full_name,geo,place_type",
    "expansions": "author_id,geo.place_id",
    "max_results": 100,
    "start_time":"2023-01-12T14:00:00Z",
    "end_time":"2023-01-12T15:00:00Z"
}

In [None]:
collect_and_process_twitter_data(bearer_token, rules, query_parameters)

In [None]:
tweets_ex4 = pd.read_pickle("tweets_exercise_4.pkl")
users_ex4 = pd.read_pickle("users_exercise_4.pkl")
places_ex4 = pd.read_pickle("places_exercise_4.pkl")

In [None]:
tweets_ex4

In [None]:
places_ex4

05. Check the *id* of the first tweet you collected in the previous exercise (which corresponds to the latest tweet). Change query_parameters dictionary to only collect tweets posted after that one (*hint:* make use of the since_id query parameter)


*Helpful links:*
- Take a look at the list of operators [here](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query#list)
- To know more about since_id parameter [check this page](https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent) or [this page](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/paginate)

In [None]:
tweets_ex4

In [None]:
rules = [
    {"value": '(chatgpt OR "chat-gpt") -(programming OR refactoring OR code) -is:retweet lang:en is:verified', "tag":"exercise_5"},
]

In [None]:
query_parameters = {
    "tweet.fields": "id,text,author_id,created_at,context_annotations,entities,geo,public_metrics,source",
    "user.fields": "id,name,username,created_at,description,location,verified,public_metrics",
    "place.fields": "id,country,country_code,name,full_name,geo,place_type",
    "expansions": "author_id,geo.place_id",
    "max_results": 100,
    "since_id":tweets_ex4.iloc[0]["id"]
}

In [None]:
collect_and_process_twitter_data(bearer_token, rules, query_parameters)

In [None]:
tweets_ex5 = pd.read_pickle("tweets_exercise_5.pkl")
users_ex5 = pd.read_pickle("users_exercise_5.pkl")
places_ex5 = pd.read_pickle("places_exercise_5.pkl")