# GenCT

This notebook combines various blocks of code from the previous notebooks to allow a user to grab the top hot posts, here 200, from `r/conspiracy`; get the topics; and then feed a random selection of topics and a random post as a model to ChatGPT to derive a new post.

In [1]:
# IMPORTS
import os, praw, pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from nltk.tokenize import word_tokenize
import openai, random

# SECRET DECODER RING
webber_id = os.getenv("REDDIT_API_ID")
webber_secret = os.getenv("REDDIT_API_SECRET")
openai.api_key = os.getenv("OPENAI_API_KEY")

## Get Posts from Reddit

In [2]:
reddit = praw.Reddit(
    client_id=webber_id,
    client_secret=webber_secret,
    user_agent="webber"
    )
print(reddit.read_only)

True


This block could be streamlined to capture only `post.selftext` and to hand that list off to the subsequent blocks.

In [3]:
conspiracy = reddit.subreddit('conspiracy')

posts = []
for post in conspiracy.hot(limit=200):
    posts.append(post.selftext)

In [4]:
posts[0:3]

['',
 'Right now on this sub you have a top post of Biden’s son massive Sex Scandal trending at number 3 on this sub right now. \n\nhttps://www.reddit.com/r/conspiracy/comments/13zj3ve/beaver_hunter_bidens_laptop_pics_are_hilarious/?utm_source=share&utm_medium=ios_app&utm_name=ioscss&utm_content=2&utm_term=1\n\nThe fact that this ISNT trending on Big Tech or Reddit tells you how fucking biased Big Tech is. This is the President’s SON. The media went crazy over Golden Shower tapes that never existed. \n\nYet here you have content gold that would sell bigly in the press and yet because of Partisan Politics silence.. \n\nTrump, Clinton, Anthony Weiner, the list goes on.\n\nType in Trump Golden Shower on Google and you get:\n\n8,420,000 results (0.47 seconds)\n\nType in Hunter Biden Sex on Google and you get:\n\n7,910,000 results (0.57 seconds)\n\nOne thing doesn’t exist (Golden Shower) the other thing does. \n\nThe Hunter Biden Sex Scandal has so much content for the media to sell it’s cr

In [5]:
text = ' '.join(posts)
len(text)

130143

In [6]:
nonan = [x for x in posts if str(x) != 'nan']
substantive = [post for post in nonan if len(post) > 299]
len(substantive)

69

## Get Topics from Posts

In [7]:
# Display Functions
# =-=-=-=-=-=-=-=-=-=-= 

def topic_details(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        message = "{:d}: ".format(topic_idx)
        message += " ".join([feature_names[i] + ' ' + str(round(topic[i], 2)) + ','
                             for i in topic.argsort()[:-n_top_words - 1:-1]])
        print(message)
    print()

def just_words(model, feature_names, n_top_words):
    topics = []
    for topic_idx, topic in enumerate(model.components_):
        words = " ".join([feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]])
        topics.append(words)
    return topics

In [8]:
# SETTINGS
n_topics = 20
n_features = 5000
n_top_words = 5
n_top_documents = 5

# NMF is able to use tf-idf
vectorizer = TfidfVectorizer(max_df = 0.9, 
                             min_df = 2, 
                             max_features = n_features,
                             stop_words = 'english',
                                  )
X = vectorizer.fit_transform(substantive)
features = vectorizer.get_feature_names_out()
features.shape

(1276,)

In [9]:
# the NMF topic model
model = NMF(n_components=n_topics, 
            init='random',
            random_state=0, 
            max_iter=500
)
W = model.fit_transform(X)
H = model.components_



In [10]:
topic_details(model, features, n_top_words)

0: yah 1.27, hermes 0.94, called 0.92, light 0.87, ai 0.72,
1: night 0.54, saw 0.5, started 0.48, bit 0.46, getting 0.36,
2: youtube 1.17, watch 1.08, www 0.87, https 0.84, com 0.83,
3: believe 1.92, flat 1.03, earth 0.86, hold 0.64, wealth 0.58,
4: kids 0.98, federal 0.64, children 0.5, elite 0.45, body 0.39,
5: thought 2.76, just 1.88, data 1.1, ve 1.0, ai 0.91,
6: accounts 1.99, gold 1.55, basically 1.47, typically 1.26, match 1.23,
7: crop 1.26, evidence 1.15, circles 0.83, burst 0.83, inside 0.61,
8: study 1.42, working 0.8, https 0.78, twitter 0.59, status 0.55,
9: apart 0.94, giant 0.77, information 0.75, service 0.62, censored 0.61,
10: goverment 1.43, gain 0.97, money 0.66, election 0.58, fund 0.55,
11: military 2.13, ukraine 1.28, spending 1.01, know 0.95, defense 0.92,
12: days 1.32, x200b 1.22, june 0.98, day 0.85, 27 0.57,
13: epstein 1.36, parties 0.68, white 0.67, people 0.46, office 0.42,
14: dna 1.08, sun 0.83, consciousness 0.83, carbon 0.78, humanity 0.76,
15: world 

In [11]:
topics = just_words(model, features, 10)
for topic in topics:
    print(topic)

yah hermes called light ai plasma said zeus bound mercury
night saw started bit getting weird hands like dots raised
youtube watch www https com png associated hundreds wiped specifically
believe flat earth hold wealth media thoughts culture think controlled
kids federal children elite body wants dragon columbia expression kid
thought just data ve ai totally qanon explains best topic
accounts gold basically typically match unless wanted couple post day
crop evidence circles burst inside proof plant real youtu maybe
study working https twitter status 20 mental com long scientific
apart giant information service censored search sources interesting pushed video
goverment gain money election fund ive care dollars elites person
military ukraine spending know defense nato actually russia big oil
days x200b june day 27 july taiwan year china 223
epstein parties white people office bit father mention founding like
dna sun consciousness carbon humanity rna earth entities history ascension
world

In [12]:
keyword_clusters = [word_tokenize(topic) for topic in topics]
keywords =  [word for sublist in keyword_clusters for word in sublist]

## BYOCT: Build Your Own Conspiracy Theory with ChatGPT

In [13]:
# We will use these instructions for all queries
instructions = """
"You are an assistant designed to create texts.
"""

def chatgptquery (query):
    completion = openai.ChatCompletion.create(
        model = "gpt-3.5-turbo",
        temperature = 1.0,
        top_p = 0.5,
        messages=[{"role": "assistant", "content": instructions},
                  {"role": "user", "content": query}])
    tokens = completion.usage.total_tokens
    reply_content = completion.choices[0].message.content
    # Get the seed, token usage, and the text
    print(tokens)
    print(reply_content)

In [17]:
# Keys are our words from the topic model
keys = random.sample(keywords, 3)
# Model is one of the posts
model = random.sample(substantive,1)

query = f"""
Create a text in the style of the text provided below
but change the topics to: {keys}.

{model}
"""

chatgptquery(query)
print(keys)

363
Reddit is everywhere, it's on your phone, on your computer, in your conversations, in your thoughts. They say that they do not know if it is harmful or not, but it obviously is, it's not natural. Do they actually know what Reddit does and only pretend not to know? Are we being brainwashed? I heard that Reddit can act as a breeding ground for extremist ideologies, for example. It seems sinister to me. Also, it seems convenient to the entities that control Reddit to keep the idea that Reddit is harmless. These entities also control the media. Maybe it's somehow connected? Brainwashing the masses and then controlling the narrative? Not sure, possibly a loose connection. But one thing is for sure, Reddit is here to stay and we need to be cautious of its influence on our day-to-day lives.
['reddit', 'entities', 'day']
