In [None]:
f

In [2]:
## Finding Guido

from dotenv import load_dotenv
load_dotenv()
import os
from tweepy import Client

bearer_token = os.environ.get("TWITTER_BEARER_TOKEN")

client = Client(bearer_token)

In [7]:
user_response = client.get_user(username="gvanrossum", user_fields=["description", "location", "public_metrics"])

In [67]:
gvr = user_response[0].id
gvr

15804774

Who does Guido Van Rossum follow? Originally I made the mistake of using "followers" instead of "following", don't do that! Also, because I find it really confusing to talk about followers and following, I'm going to refer to the accounts that GVR follows as his "subscriptions". GVR subscribes to just over 5

In [33]:
gvr_followers = client.get_users_following(15804774, user_fields=["description", "location", "public_metrics"], max_results=1000)

In [39]:
%pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Downloading pandas-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.1/12.1 MB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting numpy>=1.21.0
  Downloading numpy-1.23.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.1/17.1 MB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: numpy, pandas
Successfully installed numpy-1.23.3 pandas-1.5.0
Note: you may need to restart the kernel to use updated packages.


In [None]:
def get_follower_detail()

Conveniently, the user objects from tweepy map very nicely to pandas. A little pandas magic with json normalize lets us explode the "public_metrics" dictionary into new columns in our frame. 

In [51]:
import pandas as pd
gvr_followers_df = pd.DataFrame(gvr_followers[0])
gvr_followers_df = gvr_followers_df.merge(
    pd.json_normalize(gvr_followers_df.public_metrics), left_index=True, right_index=True).\
        drop("public_metrics", axis="columns")

We can take a look around at the accounts that Guido is following. They have a very wide range of followers, from 16 to 19,250,000. At the top end, some of the accounts are for celebrities, including: 

- Edward Snowden
- Samantha Power, former US Ambassador to the UN
- Dave Matthews
- Grant Imahara
- Leonard Nimoy

We can also see a lot of organizations, such as Meta, BBC Breaking News, the Mars Rover, etc. It is interesting that so many of these accounts have very small following counts, though, much more in line with typical people. Al Gore, for instance, has 2.9 million followers but is following just 40 accounts himself. If someone is following GVR back, it's a pretty crucial indicator that they are part of the python universe. In such a case it would be useful to see who else they follow, as it gives us a sense of where they are / what they're interested in in the Python ecosystem. 

We are left with a problem, though. If we tried to get everyone's connections, we would have 508,982 user records to sort through. There would be a lot of duplication in there for certain, but to get the list itself would be a difficult problem. The twitter api will return a max of 1000 followers per call, and there is a rate limit of 15 calls in 15 minutes. Getting the entire list, then, would take about 5 days. To get under this limit, we will need to take a different approach. We could consider: 

- taking a sample of the followers, knowing that this will leave us with missing nodes in the network. 
- limiting ourselves to 1000 followers max from each follower, which will leave us with nodes and edges missing in our network. 
- testing each account for the likelihood that it is a person rather than an institution or organization
- parsing the user's description for mentions of Python or one of the more popular libraries in the ecosystem (e.g. Django)
- taking an incremental approach, adding seeds that we know to our network and choosing next steps.

I'm kind of curious about the last approach. For example, It would be interesting to see the people who both Wes McKinney and Guido Van Rossum follow. 

In [63]:
gvr_followers_df[gvr_followers_df.name.str.contains("Wes")]

Unnamed: 0,description,id,location,name,username,followers_count,following_count,tweet_count,listed_count
337,CTO + co-founder @voltrondata. @ApacheArrow co...,115494880,"Nashville, TN",Wes McKinney,wesmckinn,59052,890,9440,1794
395,"software engineer, Python developer, technical...",8872712,"Silicon Valley, CA, USA",Wesley Chun,wescpy,3982,408,6118,178


In [65]:
wes_followers = client.get_users_following(115494880, max_results=1000, user_fields=["description", "public_metrics", "location"])

In [66]:
wes_followers

Response(data=[<User id=14738418 name=Al Sweigart username=AlSweigart>, <User id=10053622 name=Bryan Duxbury username=bryanduxbury>, <User id=1610240604 name=Dolphin Emulator username=Dolphin_Emu>, <User id=153087334 name=PCSX2 username=PCSX2>, <User id=43590278 name=Crystal Huang username=CrystalHuang>, <User id=2749409166 name=Jordan Tigani username=jrdntgn>, <User id=21223947 name=lloyd tabb username=lloydtabb>, <User id=1538914763916251136 name=MotherDuck username=motherduckdb>, <User id=18881614 name=Tim Head username=betatim>, <User id=750020524533747713 name=Hussain Sultan username=hussainsultan>, <User id=260399941 name=Gordon Shotwell username=gshotwell>, <User id=1496276482368237572 name=Neon username=neondatabase>, <User id=972773828471345153 name=houck⚡️ username=callmehouck>, <User id=3199856542 name=Alison Presmanes Hill username=apreshill>, <User id=1504956528914309122 name=Chaotic Nightclub Photos username=ClubPhotos_>, <User id=1135690652132564992 name=Retro Dodo usern

In [62]:
gvr_followers_df.sort_values(by=["following_count"]).tail(25)

Unnamed: 0,description,id,location,name,username,followers_count,following_count,tweet_count,listed_count
177,@TIME national political correspondent & NYT b...,130945778,"Washington, D.C.",Molly Ball,mollyesque,149136,3921,35338,3427
110,The Netherlands Embassy in the United States🇳🇱...,108360875,Washington D.C.,Netherlands Embassy 🇺🇸,NLintheUSA,41081,4018,28428,734
4,The #1 Python-focused podcast covering the peo...,3098427092,"Portland, OR USA",Talk Python Podcast,TalkPython,61431,4042,6076,1342
446,"Fun Stack Vibing. Started Xamarin, Mono, Gnome...",823083,"boston, ma",Miguel de Icaza,migueldeicaza,88111,4106,100871,3415
154,Nonprofit dedicated to providing affordable op...,166315104,,Girl Develop It,girldevelopit,117866,4374,4534,1989
164,Music||Teaching|Tech|Writing|Philosophy\n|| fi...,519048303,"New York, USA",dr. jess #i11o,jess_ingrass,3428,4398,10809,174
142,Championing Australian school girls using hand...,1919612864,Australia,Techgirlssuperheroes,TGAsuperheroes,6521,4555,32397,542
162,"@PyLadiesChicago, past @ThePSF Chair, Director...",21767394,"The Moon, Stars, & Cosmos.","Loooorena ""La 🐯 Tigresa” @ The Cosmos",loooorenanicole,7907,4653,19097,326
105,"Seriously, the only. Software Engineer, Teache...",2282964300,,The Only Nicholas Hunt-Walker in Existence,nhuntwalker,5065,4673,135763,200
198,"I code alone, yeah,\nWith nobody else.\nYeah, ...",15004019,Stockholm,Robert Virding,rvirding,7938,4898,9179,269


In [43]:
gvr_followers[0][0]

'<User id=14076724 name=John Lam username=john_lam>'

In [30]:
from tweepy import Paginator

gvr_followers = Paginator(client.get_users_following, 15804774, user_fields=["description", "location", "public_metrics"], max_results=1000).flatten()

In [31]:
len(gvr_followers)

TypeError: object of type 'generator' has no len()