# GenCT

This notebook combines various blocks of code from the previous notebooks to allow a user to grab the top hot posts, here 200, from `r/conspiracy`; get the topics; and then feed a random selection of topics and a random post as a model to ChatGPT to derive a new post.

In [1]:
# IMPORTS
import os, praw, pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from nltk.tokenize import word_tokenize
import openai, random

# SECRET DECODER RING
webber_id = os.getenv("REDDIT_API_ID")
webber_secret = os.getenv("REDDIT_API_SECRET")
openai.api_key = os.getenv("OPENAI_API_KEY")

## Get Posts from Reddit

In [2]:
reddit = praw.Reddit(
    client_id=webber_id,
    client_secret=webber_secret,
    user_agent="webber"
    )
print(reddit.read_only)

True


This block could be streamlined to capture only `post.selftext` and to hand that list off to the subsequent blocks.

In [25]:
conspiracy = reddit.subreddit('conspiracy')

data = []
for post in conspiracy.hot(limit=200):
    data.append([post.title, 
                  post.score, 
                  post.id, 
                  post.url, 
                  post.num_comments, 
                  post.selftext, 
                  post.created])

df = pd.DataFrame(data,
                     columns=['title', 'score', 'id', 'url', 'num_comments', 'post', 'created'])

df.shape

(200, 7)

In [4]:
# df.to_csv("rc-posts-3.csv")

In [26]:
posts = df.post.tolist()
text = ' '.join(posts)
len(text)

157600

In [None]:
nonan = [x for x in posts if str(x) != 'nan']
substantive = [post for post in nonan if len(post) > 299]

## Get Topics from Posts

In [27]:
# Display Functions
# =-=-=-=-=-=-=-=-=-=-= 

def topic_details(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        message = "{:d}: ".format(topic_idx)
        message += " ".join([feature_names[i] + ' ' + str(round(topic[i], 2)) + ','
                             for i in topic.argsort()[:-n_top_words - 1:-1]])
        print(message)
    print()

def just_words(model, feature_names, n_top_words):
    topics = []
    for topic_idx, topic in enumerate(model.components_):
        words = " ".join([feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]])
        topics.append(words)
    return topics

In [28]:
# SETTINGS
n_topics = 20
n_features = 5000
n_top_words = 10
n_top_documents = 5

# NMF is able to use tf-idf
vectorizer = TfidfVectorizer(max_df = 0.9, 
                             min_df = 2, 
                             max_features = n_features,
                             stop_words = 'english',
                                  )
X = vectorizer.fit_transform(substantive)
features = vectorizer.get_feature_names_out()
features.shape

(1480,)

In [29]:
# the NMF topic model
model = NMF(n_components=n_topics, 
            init='random',
            random_state=0, 
            max_iter=500
)
W = model.fit_transform(X)
H = model.components_

In [30]:
topic_details(model, features, n_top_words)

0: nature 0.38, god 0.33, world 0.31, people 0.19, make 0.18, like 0.18, consciousness 0.17, order 0.16, energy 0.16, question 0.15,
1: ones 0.28, really 0.26, different 0.19, chatgpt 0.18, wonder 0.16, share 0.15, facebook 0.13, repeatedly 0.12, results 0.12, starts 0.12,
2: joe 0.51, military 0.37, event 0.25, legal 0.22, epstein 0.22, picture 0.2, political 0.19, think 0.16, imagine 0.15, campaign 0.15,
3: x200b 1.72, days 1.02, 22 0.75, 11 0.66, august 0.59, year 0.57, 23 0.54, 223 0.49, later 0.47, 28 0.46,
4: crop 0.93, evidence 0.9, circles 0.64, burst 0.57, proof 0.43, inside 0.41, plant 0.4, real 0.33, youtu 0.31, video 0.3,
5: flat 1.44, earthers 1.02, earth 0.86, 5g 0.8, start 0.68, group 0.67, ll 0.65, media 0.35, believe 0.27, interested 0.26,
6: mossad 1.01, israel 0.77, agents 0.74, service 0.74, agent 0.7, secret 0.68, epstein 0.6, killed 0.46, died 0.45, conclusion 0.29,
7: america 1.03, effort 0.98, department 0.77, conditions 0.73, abuse 0.71, child 0.7, track 0.69, 

In [31]:
topics = just_words(model, features, 10)
for topic in topics:
    print(topic)

nature god world people make like consciousness order energy question
ones really different chatgpt wonder share facebook repeatedly results starts
joe military event legal epstein picture political think imagine campaign
x200b days 22 11 august year 23 223 later 28
crop evidence circles burst proof inside plant real youtu video
flat earthers earth 5g start group ll media believe interested
mossad israel agents service agent secret epstein killed died conclusion
america effort department conditions abuse child track report brothers mention
just thought ai think job data dont way 2024 good
people time went slowly says don maybe close caught bit
post reddit posts 100 despite hours freedom right accounts attempts
https png 17 webp redd format width auto preview enabled
uk bbc news www world material mr government covid https
ve wondering just evil place shit win gone accept certain
theory conspiracy favourite claim argue hidden claims user control devices
https com www youtube science tru

In [32]:
keyword_clusters = [word_tokenize(topic) for topic in topics]
keywords =  [word for sublist in keyword_clusters for word in sublist]

## BYOCT: Build Your Own Conspiracy Theory with ChatGPT

In [36]:
# We will use these instructions for all queries
instructions = """
"You are an assistant designed to create texts.
"""

def chatgptquery (query):
    completion = openai.ChatCompletion.create(
        model = "gpt-3.5-turbo",
        temperature = 1.0,
        top_p = 0.5,
        messages=[{"role": "assistant", "content": instructions},
                  {"role": "user", "content": query}])
    tokens = completion.usage.total_tokens
    reply_content = completion.choices[0].message.content
    # Get the seed, token usage, and the text
    print(tokens)
    print(reply_content)

In [40]:
# Model below is from r/conspiracy
# 7th top post of the day at 2023-06-03-0800
# https://www.reddit.com/r/conspiracy/comments/13zamrr/\
# foreign_powers_are_destabilizing_the_us/
keys = random.sample(keywords, 3)
model = random.sample(substantive,1)

query = f"""
Create a text in the style of the text provided below
but change the topics to: {keys}.

{model}
"""

chatgptquery(query)
print(keys)

427
Here's a little story that can fill you in on what I know.

A recent study says that the best way to stay healthy is to exercise regularly and eat a balanced diet. However, not everyone has the time or motivation to do so. 

Ones who are looking for a quick and easy way to improve their health might consider using wearable devices such as fitness trackers or smartwatches. These devices can track your activity levels, heart rate, and even monitor your sleep patterns. 

But be careful when choosing which devices to use. Some cheaper options may not be as accurate or reliable as others. It's important to do your research and read reviews before making a purchase. 

Anyone have any advice on which devices are the best for tracking fitness and health? 

Why I posted it in technology - I think wearable devices are the future of health and fitness. Not sure. But with all the advancements in technology, this might be the place to uncover something new and exciting. 

Thanks.
['says', 'ones