# Social Network Analysis
## AO3 User Bookmarks

The following is an analysis of the network created by users bookmarking fics on AO3.

We used the [Social Network Analysis with Python]("https://www.kirenz.com/post/2019-08-13-network_analysis/") article by Jan Kirenz as a reference for our analysis.

In [1]:
#imports
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
import warnings; warnings.simplefilter('ignore')
import json
import pandas as pd

First we must read in our data and format it so that we can make a graph of the data.

In [2]:
#read in data
topfic_bookmarks = pd.read_json('../ao3bot/top_fic_bookmarks.json')
topfic_bookmarks.head()


Unnamed: 0,fandom,total_bookmarks,users,fic_info
0,僕のヒーローアカデミア | Boku no Hero Academia | My Hero ...,16100,[{'user_link': '/users/whole_grain_bagel/pseud...,"{'fic_link': '/works/8337607', 'fic_name': 'Ye..."
1,Haikyuu!!,11388,[{'user_link': '/users/artisticDragon074639/ps...,"{'fic_link': '/works/5096105', 'fic_name': 'In..."
2,Naruto,8451,[{'user_link': '/users/AllCrush/pseuds/AllCrus...,"{'fic_link': '/works/8211566', 'fic_name': 'Of..."
3,Shingeki no Kyojin | Attack on Titan,3809,[{'user_link': '/users/Frank_boi/pseuds/Frank_...,"{'fic_link': '/works/2336534', 'fic_name': 'Ei..."
4,Miraculous Ladybug,4335,[{'user_link': '/users/cook999/pseuds/cook999'...,"{'fic_link': '/works/7568518', 'fic_name': 'Pi..."


In [3]:
user_df = pd.read_json('../ao3bot/user_bookmarks_final.json')
user_df.head()

Unnamed: 0,user,fic_info,fandoms,user_stats
0,Watermelonflooff,"[[{'fic_link': '/works/8337607', 'fic_name': '...",[[{'fandom_link': '/tags/%E5%83%95%E3%81%AE%E3...,"{'total_bookmarks': 18, 'ratings': {'Explicit'..."
1,SanguineRoar,"[[{'fic_link': '/works/24273403', 'fic_name': ...",[[{'fandom_link': '/tags/%E5%83%95%E3%81%AE%E3...,"{'total_bookmarks': 263, 'ratings': {'Explicit..."
2,duhvy,"[[{'fic_link': '/works/8337607', 'fic_name': '...",[[{'fandom_link': '/tags/%E5%83%95%E3%81%AE%E3...,"{'total_bookmarks': 53, 'ratings': {'Teen And ..."
3,Sther_2515,"[[{'fic_link': '/works/8337607', 'fic_name': '...",[[{'fandom_link': '/tags/%E5%83%95%E3%81%AE%E3...,"{'total_bookmarks': 143, 'ratings': {'Teen And..."
4,BlackCat666,"[[{'fic_link': '/series/56292', 'fic_name': 'T...",[[{'fandom_link': '/tags/Teen%20Wolf%20(TV)/wo...,"{'total_bookmarks': 410, 'ratings': {'Teen And..."


This data is set up such that in `user_df` we have the `user` whose bookmarks we scraped, the `fic_info` of the bookmarks and the `fandoms` associated with those bookmarks. We also have the general stats of their bookmarks in `user_stats`.

In `topfic_bookmarks` we have the information on the top fics in each fandom we have scraped and the users that have bookmarked it. This is the starting file that guiding the scraping for the `user_df` data.

### Common Bookmarks

In this analysis we will be looking at the data in this way:
- Nodes: Fic
- Edges: A user has bookmarked this node (fic) and this node (fic)

To do this we will need to wrangle the data into the correct format.

In [27]:
tf_users = topfic_bookmarks["users"].tolist()
topfics = topfic_bookmarks["fic_info"].tolist()
users = user_df["user"].tolist()
fics = []
for fic in topfics:
    for users in tf_users:
        for user in users:
            l = user_df.loc[user_df["user"] == user["user_name"]]["fic_info"]
            if len(l) > 0:
                for i in l:
                    for j in i:
                        for k in j:
                            if 'fic_link' in k.keys():
                                if fic["fic_link"] != k["fic_link"]:
                                    fics.append([fic["fic_link"], k["fic_link"]])
                                    

In [28]:
fics_df = pd.DataFrame(fics, columns = ["ficA", "ficB"])
fics_df

Unnamed: 0,ficA,ficB
0,/works/8337607,/works/4576425
1,/works/8337607,/works/31830625
2,/works/8337607,/works/30880025
3,/works/8337607,/works/33491398
4,/works/8337607,/works/33470218
...,...,...
549339,/works/1113606,/works/1239886
549340,/works/1113606,/works/34433314
549341,/works/1113606,/works/28610427
549342,/works/1113606,/works/19870906


In [42]:
#we can go through and graph it here, but I am going to export and go to a different software for faster graphing
fics_df.to_csv("common_fics_edges.csv", index = False, header = False)
# fic1 = fics_df["ficA"].tolist()
# fic2 = fics_df["ficB"].tolist()
# i = 0;
# g = nx.Graph()
# while i < 10000:
#     g.add_edge(fic1[i], fic2[i])
#     i+=1
# print(nx.info(g))

In [39]:
# pos = nx.spring_layout(g)
# betCent = nx.betweenness_centrality(g, normalized=True, endpoints=True)
# node_color = [20000.0 * g.degree(v) for v in g]
# node_size =  [v * 10000 for v in betCent.values()]
# plt.figure(figsize=(10,10))
# nx.draw_networkx(g, pos=pos, with_labels=False,
#                  node_color=node_color,
#                  node_size=node_size )
# plt.axis('off');