<h4>Skill 1 description:</h4>
Given a series of tweets on a theme, can GPT-3 generate additional tweets about that theme?

<h4>What this code does:</h4>
The code in this notebook is only intended to collect tweets via the Twitter API. Once we gathered several tweets together, I inputted them directly to GPT-3 (via OpenAI's webpage for GPT-3, not the API). The input and output are placed in a text block at the bottom of thiis file.

<h3>Gathering Tweets from the Twitter API</h3>

This code was executed via the command line using the search_tweets.py package. Note that due to changes in the API product we have access to and in the support for the search_tweets.py package, it may no longer execute properly. This folder also contains a .json file with the results of this API call. The call itself collects ~500 replies to an influential "climate contrarian" publication, @ClimateDepot, where the replies contain one of a series of keywords and were posted at some point over the Trump administration (prior to the 2020 election). 

search_tweets.py \
--filter-rule "(climate OR CO2 OR expert OR science OR scientist) lang:en to:ClimateDepot is:reply" \
--start-datetime 2017-01-21 \
--end-datetime 2020-11-02 \
--filename-prefix ClimateDepot_replies \
--max-results 500 \
--print-stream

<h3>Reformatting and Identifying the Top Tweets</h3>

In [11]:
import json
import pandas as pd
import re

In [2]:
# Twitter API returns a .json file which python struggles to parse.
# This function reformats the structure of the .json output and saves it to a new file

def reformat_json(filename):
    with open('{}.json'.format(filename), 'r') as f:
        data = f.readlines()
        f.close()
    
    new_file = """{\n"""
    for i in range(len(data)-1):
        new_file += '"{}": {},\n'.format(i, data[i])
    new_file += '"{}": {}\n'.format(len(data), data[-1])
    new_file += """}"""
    
    with open('{}_reformatted.json'.format(filename), 'w') as f:
        f.write(new_file)
        f.close()

In [3]:
# This function returns the .json output in a way that makes it easier to query by fields like favorite_count

def open_df(filename):
    df = pd.read_json('{}_reformatted.json'.format(filename))
    df = df.T    
    return df

In [4]:
# The API used to collect these tweets (v 1.1) saved the content of tweets over 140 characters in a different field.
# This function reformats the df to include the full text of n tweets in one field

def unpack_tweets(df, n):
    mask = df.extended_tweet.notnull()
    tweets = [df.extended_tweet[i]['full_text'] if mask[i] == True else df.text[i] for i in range(n)]
    return tweets

In [13]:
# This function selects the n tweets with the most likes

def sample_tweets(df, n):
    
    # First remove retweets
    df = df[~df.text.str.contains('RT ')]
    
    df_sorted = df.sort_values(by='favorite_count', ascending=False).reset_index()
    tweets = unpack_tweets(df_sorted, n=n)
    
    # Text processing: remove urls, handles, and line breaks. Replace the string '&amp;' with the & character
    tweet_texts = [re.sub('http\S+', '', tweet) for tweet in tweets]
    tweet_texts = [re.sub('@\S+\s', '', tweet) for tweet in tweet_texts]
    tweet_texts = [re.sub('\n', ' ', tweet) for tweet in tweet_texts]
    tweet_texts = [re.sub('&amp;', '&', tweet) for tweet in tweet_texts]
    
    return tweet_texts

In [21]:
# Finally, give a name of a datefile and return the top n tweets

def select_tweets(filename, n):
    reformat_json(filename)
    df = open_df(filename)
    selected_tweets = sample_tweets(df, n=n)
    
    # Used the selected tweets to create a prompt string to feed to GPT-3
    prompt = ''
    for i in range(len(selected_tweets)):
        prompt += 'Tweet {}: {}\n\n'.format(i+1, selected_tweets[i])
    prompt +='Tweet {}:'.format(len(selected_tweets)+1)
    
    return prompt

In [22]:
# Results

print(select_tweets('ClimateDepot_replies', 10))

Tweet 1: There should always be a clear distinction between the engineers of NASA who achieve things, and the climate mob pilfering their name and bloated on research funding.

Tweet 2: Here is some background on how Greta's manufactured rise to climate stardom occurred. 

Tweet 3: The Red Pope says we only have a few years to fix the climate and this time he really, really, really means it, really. 


Tweet 5: The idea that humans can control climate change is delusional.

Tweet 6: All science is refutable, that is what makes it science. If something is not capable of being refuted by experiment or observation then it is not science. 

Tweet 7: They mean that 4 more years of Trump is game over for the climate scam  I think so too, but I'm not certain.

Tweet 8: She is obviously not a scientist where science is the pursuit of 'absolute truths' & has no relevance to her political biases & goals. She frankly, epitomizes all that is wrong with the attempted politicization of science by pe

<h3>Generating a GPT-3 Response</h3>

The following code block is not executed and would require an access key from GPT-3 but running it would probabilistically generate a series of tweets to continue the list above. Temperature is a parameter that controls 
the randomness of the output—0.7 is the default.

In [None]:
import openai
openai.api_key = "[ACCESS KEY GOES HERE]"

prompt = select_tweets('ClimateDepot_replies', 10)

response = openai.Completion.create(engine='davinci', prompt=prompt, max_tokens=200, n=1, stop=['Tweet 16'], temperature=0.7)
response = response.get('choices')[0].text.strip()

Our first output (used in the paper):

The best climate change propaganda is not propaganda at all, because it is not obviously propaganda. It does not name itself as such, and therefore, readers are more likely to believe it.\n\nTweet 12: The US government says we can't afford to build new nuclear power plants. But it is spending billions to subsidize wind and solar, which are killing people.\n\nTweet 13: I don't think it's a coincidence that climate change is the new global warming. They can't talk about temperature increases because they're no longer happening.\n\n Tweet 14: Climate change is the new communism - an ideology based on a false science that cannot be questioned.\n\nTweet 15: The climate mafia has done far more to damage the reputation of science than anything else in the history of science.\n\n