**Overview**

For my mini project, I dove into Twitter data through the Twitter API. Reminder that I mixed up the MP1 and A5 assignments, so this Mini Project is using the same data as A5. I hope that's ok! I haven't seen any feedback on A5 or MP1, so I just went with it.

Here are the topics I examined:

- How do trending Twitter topics in Seattle compare from one week to the next?
- How much overlap is there between trending Twitter topics in Seattle and the topics on my personal timeline?

These topics are interesting to me because I follow Twitter for a lot of local news and perspectives. So it will be interesting to see how closely my personal feed follows local trends.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory


        
import os
for dirname, _, filenames in os.walk('/kaggle/working'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


# Any results you write to the current directory are saved as output.

**Data Profile**

I looked at this page for reference on how to access the Twitter API: 
http://socialmedia-class.org/twittertutorial.html

To use the Twitter API, you need to import the tweepy console (pip install tweepy). You also need 4 different keys/tokens to access the Twitter API. 

Below I'm setting up these two items. I also imported tweepy into the console below. 


In [None]:

import json
import tweepy


from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
CONSUMER_KEY = user_secrets.get_secret("TwitterKey")
CONSUMER_SECRET = user_secrets.get_secret("TwitterSecretKey")
ACCESS_SECRET  = user_secrets.get_secret("TwitterSecretToken")
ACCESS_TOKEN = user_secrets.get_secret("TwitterToken")


auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)


Just to test if it worked, this should print the tweets on my home page and save them to a json:

In [None]:

for status in tweepy.Cursor(api.home_timeline).items(500):
    print(json.dumps(status._json, indent=4))


In [None]:
    
with open('homepage0225.txt', 'w') as outfile:
    json.dump(status._json, outfile, indent=4)

The below code prints a json of the trending topics for a location based on the WOEID. I found the WOEID for Seattle through some Google searching.

In [None]:
sea_trends = api.trends_place(id = 2490383)
print(json.dumps(sea_trends, indent=4))

However, the above code doesn't drill down deep enough into my data. It basically created one big dictionary nested under "trends." If I were to plot this, it wouldn't be very interesting data. What I want is to parse out the values one level down - under "name," and "tweet volume." So I need to tell the json dump to pull from the 0th item under "trends."

In [None]:
#Now I'm just calling up each tweet trend individually, rather than having them all nested under the "trends" header.

sea_trends = api.trends_place(id = 2490383)
print(json.dumps(sea_trends[0]["trends"], indent = 4))

Next, I want to create a json file with the above data.

In [None]:
with open('sea_trends0225.txt', 'w') as outfile:
    json.dump(sea_trends[0]["trends"], outfile, indent=4)
    
#The below code prints the file name for my new file, as well as a couple files I created earlier this week with trends
#from 2/17 and 2/23. These dates are a bit arbitrary, but it's what I've got!
    
for dirname, _, filenames in os.walk('/kaggle/working'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

**Analysis**

To do my analysis, I'm importing matplotlib and pandas and creating dataframes out of my json data. I made one dataframe for each date that I have data for.

In [None]:

import matplotlib.pyplot as plt
import pandas as pd


df17 = pd.read_json("/kaggle/input/sea_trends0217.txt")
df23 = pd.read_json("/kaggle/input/sea_trends0223.txt")
df25 = pd.read_json("/kaggle/working/sea_trends0225.txt")

The below code takes out the NaN data from the sea_trends json files and plots them as bar charts. I only want to look at Twitter trends with actual values in the Tweet Volume field. However, this shows the data on three separate charts. I want to overlay them on top of each other to see which trending topics extended from last week into this week.

In [None]:
df17 = df17[df17['tweet_volume'].notna()]
df23 = df23[df23['tweet_volume'].notna()]
df17.plot(kind="bar", x = "name", y = "tweet_volume", use_index="false", color = "red",figsize=(20,10))
df23.plot(kind="bar", x = "name", y = "tweet_volume", use_index="false", color = "blue",figsize=(20,10))
df25.plot(kind="bar", x = "name", y = "tweet_volume", use_index="false", color = "green", figsize=(20,10))

I found some instruction in the Pandas documenation about how to combine different charts into one. So I did that below and added some pretty colors! I also updated the label names to be more descriptive and increased the font of the trending names.

In [None]:
df17 = df17[df17['tweet_volume'].notna()]
df23 = df23[df23['tweet_volume'].notna()]

ax = df25.plot(kind="bar", x = "name", y = "tweet_volume", use_index="false", color = "mediumspringgreen",
               label = "Number of Tweets 2/25", fontsize= 22, figsize=(20,15))

ax2 = df23.plot(kind="bar", x = "name", y = "tweet_volume", use_index="false", color = "seagreen", figsize=(20,15), 
          label = "Number of Tweets 2/23", fontsize = 22, ax=ax)

df17.plot(kind="bar", x = "name", y = "tweet_volume", use_index="false", 
          label = "Number of Tweets 2/17", color = "darkgreen",fontsize = 22, figsize=(20,15), ax=ax2)

**Conclusion / Future Work**

Unfortunately I wasn't able to figure out how to parse through the data from my Twitter home page. There's just so much NaN data, I can't really figure out what's happening there, or how to find the keywords that would correspond to my "trending topics" files...

My future work would definitely involve parsing the home page data and figuring out how to visualize it in the same way I did for the trending topics data. Unfortunately I'm out of time for this assignment! 

In [None]:
homepage = pd.read_json("/kaggle/working/homepage0225.txt")
homepage