<img src='assets/bulldogs.jpg'>

# Welcome to Twitter-topshotusername-extractor

# Written By: Max Paul
### Editor/ Mentor: Kevin Mentzer PhD.


```This tool was inspired and created from the work of an internet friend. Thank you for showing me what you do.```


```This tool is intended to store twitter data regarding topshot giveaways to a PostgreSQL database. After the data is stored from Streamer.ipynb, the user can then use final_push.ipynb to clean down the tweets; leabing behind only a topshot username```

``` Some potential use cases for this project would be to ID twitter users who tweet out non-distinct TS usernames. (which means a single person has more than one topshot account) *** in some not all cases ***```

```Another interesting use case could be personalized marketing campaigns through twitter. This database will allow us to connect a real twitter user to the Data Base created by Beau Gunderson ( the foriegn key being the topshot username ). We could leverage users past sales to show them up and coming events and or moments they may be interested in.```


***The NBATopShot community is a unique and special community; in the sense that it was one of the first successful projects that brought blockchain into the mind of someone who still thought of blockchain as something of the future. TopShot brings the excitement of blockchain into the industry of sports and sports collectibles.***




***Something notable was the involvement NBATopShot has with their community over twitter and how moments could be recommended by tagging “#NBATopShotThis”. In March of 2021 there was a pattern of twitter accounts doing moment giveaways. For example, a user would reply to the giveaway tagging three(3) friends and their TopShot username in the reply (Below)*** 

<img src='assets/1.png'> <img src='assets/2.png'><br>



***Additionally, after collecting 100,000 tweets there was an overwhelming syntax that arose, “TS:” or “TS-” followed by the username. After storing the data into a PostgreSQL database and doing data exploration, it was notable to find some twitter accounts that would use different TopShot usernames for giveaways. Along with identifying a person with multiple TopShot accounts, we can retrieve twitter data on the user (real name, location etc.). In doing so this would allow us to map a real person’s information to their TopShot username and their flow address from the TopShot PostgreSQL database.***



# Requirements

- python
	- psycopg2
	- pandas
	- re
	- sqlalchemy
- Postgre local database
	- ability to run pre-made SQL scripts to create the database
	- understand basic queries to extract info needed

# Files

This project contains a few different files, and here is what they do.


# tables.sql

- create a database in pgadmin with the name of your choosing
- enter the query tool and execute the `CREATE TABLE` commands in order
	- twitter_user
	- twitter_tweet
	- topshot_username
### Below is the ERD for this database
	- each user can tweet many times
	- each user can tweet more than one topshot names
	- would results in people with duplicate accounts (bad!!!)
<img src='assets/Picture1.jpg'>


# query.sql

- storage place for frequently used queries
	- has the query to get all tweets from the database containing "TS" minus n days from today
		- you can select how many days behind if you forget to run the final_push.ipynb one day
		- which is used in the final code


# Streamer.ipynb


### THIS FILE STARTS THE TWITTER LISTENER TO YOUR DATABASE
- Things to do before running
	- insert your Twitter Developer keys (bottom of file)
	- insert your postgre connection information
		- host
		- database
		- port 
		- user 
		- password
	- After these items are placed and your postgre tables are made you are ready to collect tweets.



# final_push.ipynb

## BE SURE TO EDIT QUERY IN FINAL PUSH BASED ON YOUR PUSHING SCHEDULE

` CURRENT_DATE - 1` is the equivalent to yesterday

 - edit query in code cell 2
   - takes tweets with topshot
   - also filters on date so you dont put the same exact data in twice
   - if we made user_id primary key we would not be able to ID users using multiple accounts

- Things to do before running
	-  insert your postgre connection information
		- host 
		- database
		- port 
		- user
		- password
	- at the end of the file make sure you connect SQLalchemy to your database so you can write the new data into their all at once instead of looping over each item.
	- After these items are placed you are ready to insert the cleaned data


In [None]:
from twython import TwythonStreamer  
import csv
import codecs
import json
import time
# Filter out unwanted datas


# this functions allows us to specify which items we want to retrieve from the twitter API
def process_tweet(tweet):  
    d = {}
    d['hashtags'] = [hashtag['text'] for hashtag in tweet['entities']['hashtags']]
    d['tweet_id'] = tweet['id']
    d['created_at'] = tweet['created_at']
    d['text'] = tweet['text']
    d['name'] = tweet['user']['name']
    d['user'] = tweet['user']['screen_name']
    d['user_id'] = tweet['user']['id']
    d['user_loc'] = tweet['user']['location']
    d['user_desc'] = tweet['user']['description']
    d['user_followers'] = tweet['user']['followers_count']
    d['user_friends'] = tweet['user']['friends_count']
    d['user_listed'] = tweet['user']['listed_count']
    d['user_created'] = tweet['user']['created_at']
    d['user_favs'] = tweet['user']['favourites_count']
    d['user_verified'] = tweet['user']['verified']
    d['user_statuses'] = tweet['user']['statuses_count']

    return d


# Create a class that inherits TwythonStreamer
class MyStreamer(TwythonStreamer):     

    # Received data
    def on_success(self, data):

        # Save full JSON to file
        with open('topGiveaway.json', 'a') as jsonfile:
            json.dump(data, jsonfile)        

        # Only collect tweets in English
        if data['lang'] == 'en':
            tweet_data = process_tweet(data)
            self.write_to_pgadmin(tweet_data)
            self.save_to_csv(tweet_data)
            #would call mongo function here to write it to the database

    # Problem with the API
    def on_error(self, status_code, data):
        print(status_code, data)
        self.disconnect()

    # Save each tweet to csv file
    def save_to_csv(self, tweet):
        with open('topgiveaway.csv', 'a', encoding="utf8") as file:
            writer = csv.writer(file)
            writer.writerow(list(tweet.values()))
            
    def write_to_pgadmin(self,tweet):
        import psycopg2
        conn = psycopg2.connect(host="localhost",
                    database="top-twit-mapping",
                    port=5432,
                    user='postgres',
                    password=3301)# password of integers takes no quotations
        cur = conn.cursor()
        tweet = tweet
        # insert user information
        command = '''INSERT INTO twitter_user (realname,username,user_id,user_loc,user_desc,user_followers,user_friends,user_listed,user_created,user_favs,user_verified,user_statuses) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s) ON CONFLICT
                     (user_id) DO NOTHING;'''
        cur.execute(command,(tweet['name'],tweet['user'],tweet['user_id'],tweet['user_loc'],tweet['user_desc'],tweet['user_followers'],tweet['user_friends'],tweet['user_listed'],tweet['user_created'],tweet['user_favs'],tweet['user_verified'],tweet['user_statuses']))     
       
        command2 = '''INSERT INTO twitter_tweet (tweet_id,created_at,tweet,user_id) VALUES (%s,%s,%s,%s) ON CONFLICT
                     (tweet_id) DO NOTHING;'''
        cur.execute(command2,(tweet['tweet_id'],tweet['created_at'],tweet['text'],tweet['user_id']))
        
        conn.commit()
        
        cur.close()
        
        conn.close()
        
while True:
    try:
        # Instantiate from our streaming class
        # enter your Developer Keys HERE
        stream = MyStreamer('','','','')

        # Start the stream
        stream.statuses.filter(track='@NBATopShot,TS') #Track uses comma separated list 
    except (KeyboardInterrupt):
        print('Exiting')
        break
    except Exception as e:   
        #print("error - sleeping " + str(e))
        time.sleep(5)
        continue

 

