# If with Twitter Lab

### Introduction

In this lesson, we'll use our knowledge of for loops and if else statements to search through twitter data.  

### Loading our Data

Let's begin with loading our data from a CSV file.  We'll convert the dataframe to a list of dictionaries called `users`.

In [2]:
import pandas as pd

df = pd.read_csv('./twitter_accounts.csv', thousands = ',')
users = df.drop(columns = ['Rank', 'Activity', 'twitter handle']).to_dict('records')

In [3]:
users[:2]

[{'Name': 'Barack Obama',
  'Followers': 110890048,
  'Following': 610113,
  'Tweets': 15704,
  'Nationality/headquarters': 'U.S.A',
  'Industry': 'Politics'},
 {'Name': 'KATY PERRY',
  'Followers': 108315414,
  'Following': 222,
  'Tweets': 10202,
  'Nationality/headquarters': 'U.S.A',
  'Industry': 'Music'}]

## Initial Exploration

So currently we have a list of dictionaries, where each dictionary is a top twitter user.  We can find out the number of users in our dataset by looking at the length of the list. 

In [4]:
len(users)

100

So we can see that our list consists of 100 has one hundred different elements with each element, being a dictionary and representing a different twitter user.   

In [5]:
users[:2]

[{'Name': 'Barack Obama',
  'Followers': 110890048,
  'Following': 610113,
  'Tweets': 15704,
  'Nationality/headquarters': 'U.S.A',
  'Industry': 'Politics'},
 {'Name': 'KATY PERRY',
  'Followers': 108315414,
  'Following': 222,
  'Tweets': 10202,
  'Nationality/headquarters': 'U.S.A',
  'Industry': 'Music'}]

Ok, so now let's find the names of all of the top twitter users and storing them in a list.

In [6]:
twitter_names = []

for user in users:
    twitter_names.append(user['Name'])

In [7]:
twitter_names[:5]

# ['Barack Obama', 'KATY PERRY', 'Justin Bieber', 'Rihanna', 'Taylor Swift']

['Barack Obama', 'KATY PERRY', 'Justin Bieber', 'Rihanna', 'Taylor Swift']

We can confirm that our new list has 100 elements, with each element as a string.

In [8]:
len(twitter_names)

100

Now let's begin to find some more information on our twitter users.  Below, create a list of nationalities of each user.

In [9]:
nationalities = []
for user in users:
    nationalities.append(user['Nationality/headquarters'])

In [10]:
nationalities[:8]

['U.S.A', 'U.S.A', 'Canada', 'Barbados', 'U.S.A', 'Portugal', 'U.S.A', 'U.S.A']

To get a unique list of nationalities, we can take our list of nationalities and wrap it in a `set`, like so.

In [11]:
unique_nationalities = set(nationalities)

print(unique_nationalities)

{'Portugal', 'Ireland', 'France', 'Puerto Rico', 'Saudi Arabia', 'India', 'Spain', 'Barbados', 'U.K', 'Germany', 'Columbia', 'Korean', 'U.S.A', 'Europe', 'Canada', 'Brazil'}


> A set is a collection of elements where every element is unique.  So by wrapping our converting our list to a set, we effectively uniqued our list of countries.

In [12]:
nationalities[:8] # ['U.S.A', 'U.S.A', 'Canada', 'Barbados', 'U.S.A', 'Portugal', 'U.S.A', 'U.S.A']

unique_nationalities = set(nationalities)
unique_nationalities

{'Barbados',
 'Brazil',
 'Canada',
 'Columbia',
 'Europe',
 'France',
 'Germany',
 'India',
 'Ireland',
 'Korean',
 'Portugal',
 'Puerto Rico',
 'Saudi Arabia',
 'Spain',
 'U.K',
 'U.S.A'}

Next, use the pattern above to create a set of the unique `Industry` of each user.

In [13]:
industries = []

for user in users:
    industries.append(user['Industry'])

In [14]:
# update the line below to get a unique set of industries
unique_industries = set(industries)

print(unique_industries)

{'Technology ', 'Films/Entertainment', 'news', 'Politics', 'News', 'Space Agency', 'music', 'Music', 'Publishing Industry', 'Sports', 'Television', 'Business', 'sports'}


> One thing we may note is that our data is a bit messy, with `sports` sometimes being lowercased and sometimes being uppercased.

Ok, so now that we know where our top twitter users do, and where they are from, let's dig in on a specific country.  Create a **list of dictionaries** of all of the twitter users in canada and assign it to the variable `canadian_users`.

In [15]:
canadian_users = []

for user in users:
    if user['Nationality/headquarters'] == 'Canada':
        canadian_users.append(user)

> We can see that there are four Canadian users.

In [16]:
len(canadian_users)

4

And if we print out each of them we should see the following:

In [17]:
for canadian_user in canadian_users:
    print(canadian_user)

{'Name': 'Justin Bieber', 'Followers': 107410873, 'Following': 296418, 'Tweets': 30462, 'Nationality/headquarters': 'Canada', 'Industry': 'Music'}
{'Name': 'Drake', 'Followers': 38781907, 'Following': 625, 'Tweets': 1745, 'Nationality/headquarters': 'Canada', 'Industry': 'Music'}
{'Name': 'Shawn Mendes', 'Followers': 24451861, 'Following': 57964, 'Tweets': 14945, 'Nationality/headquarters': 'Canada', 'Industry': 'music'}
{'Name': 'Avril Lavigne', 'Followers': 21546496, 'Following': 167, 'Tweets': 4107, 'Nationality/headquarters': 'Canada', 'Industry': 'music'}


Next, let's see which of our Twitter users are popular without even trying.  Create a ***list of dictionaries** of twitter users who have tweeted fewer than 1000 times yet still are in our top 100.

In [18]:
lazy_tweeters = []

for user in users:
    if user['Tweets'] < 1000:
        lazy_tweeters.append(user)

In [19]:
for lazy_tweeter in lazy_tweeters:
    print(lazy_tweeter)

{'Name': 'Taylor Swift', 'Followers': 85520236, 'Following': 0, 'Tweets': 396, 'Nationality/headquarters': 'U.S.A', 'Industry': 'Music'}
{'Name': 'Adele', 'Followers': 27488867, 'Following': 0, 'Tweets': 310, 'Nationality/headquarters': 'U.K', 'Industry': 'music'}
{'Name': 'daniel tosh', 'Followers': 25852762, 'Following': 125, 'Tweets': 15, 'Nationality/headquarters': 'Germany', 'Industry': 'Films/Entertainment'}
{'Name': 'Aamir Khan', 'Followers': 25403135, 'Following': 9, 'Tweets': 777, 'Nationality/headquarters': 'India', 'Industry': 'Films/Entertainment'}
{'Name': 'Marshall Mathers', 'Followers': 22785561, 'Following': 0, 'Tweets': 925, 'Nationality/headquarters': 'U.S.A', 'Industry': 'Music'}


And next create a **list of names** of each user who is following fewer than 10 people and assign them to the variable `low_following_users`.

In [20]:
low_following_users = []

for user in users:
    if user['Following'] < 10:
        low_following_users.append(user['Name'])

In [21]:
print(low_following_users)

['Taylor Swift', 'Twitter', 'BBC Breaking News', 'Chris Brown', "Conan O'Brien", 'Adele', 'Aamir Khan', 'Marshall Mathers', 'A.R.Rahman', 'MohamadAlarefe']


> Find the number of low followers below.

In [22]:
len(low_following_users)

# 10

10

### More advanced queries

Ok, so now that we have worked moved through some initial problems to filter out our data, let's move onto some more complicated problems.  Remember that if we look at our list of industries, sometimes the industry is capitalized and sometimes it's lowercased.

In [23]:
industries = []

for user in users:
    industries.append(user['Industry'])
    
print(set(industries))

{'Technology ', 'Films/Entertainment', 'news', 'Politics', 'News', 'Space Agency', 'music', 'Music', 'Publishing Industry', 'Sports', 'Television', 'Business', 'sports'}


This means that if we try to find those in the music industry by seeing what matches `Music`, we'll miss all of the users who have music listed with a lowercase.  One way to solve for this is with an `or` statement.  We can use it like so.

In [24]:
musical_usernames = []

for user in users:
    if user['Industry'] == 'Music' or user['Industry'] == 'music':
        musical_usernames.append(user['Name'])

In [25]:
musical_usernames[:3]

['KATY PERRY', 'Justin Bieber', 'Rihanna']

Let's focus on the if statement above.

```python
if user['Industry'] == 'Music' or user['Industry'] == 'music':
    ...
```

So now our if statement is making two evaluations.  If either `user['Industry'] == 'Music'` or `user['Industry'] == 'music` returns True, the entire statement will evaluate to True.

In [26]:
user = {'name': 'KATY PERRY', 'Industry': 'music'}

user['Industry'] == 'Music' or user['Industry'] == 'music' # True

True

Ok, so this time it's your turn.  Use an `or` statement to find a list of all users who are in sports, whether sports is capitalized or lower cased.

> Do not copy and paste the above statement -- we need to build muscle memory.

In [35]:
athletic_usernames = []

# add code here

athletic_usernames[:5]

# ['Cristiano Ronaldo', 'Neymar Jr', 'LeBron James', 'SportsCenter', 'ESPN']

['Cristiano Ronaldo', 'Neymar Jr', 'LeBron James', 'SportsCenter', 'ESPN']

> Confirm that we have all 17 users above.

In [36]:
len(athletic_usernames)

17

Finally, while `or` allows us to will evaluate to True if either condition holds, the keyword `and` only returns True when **both** conditions evaluate to `True`.  For example, let's say we want to find users who have fewer than 10 followers *and* fewer than 500 tweets.  We can do so with the following:

In [50]:
low_activity_users = []

for user in users:
    if user['Following'] < 10 and user['Tweets'] < 500:
        low_activity_users.append(user)
        
low_activity_users[:5]

[{'Name': 'Taylor Swift',
  'Followers': 85520236,
  'Following': 0,
  'Tweets': 396,
  'Nationality/headquarters': 'U.S.A',
  'Industry': 'Music'},
 {'Name': 'Adele',
  'Followers': 27488867,
  'Following': 0,
  'Tweets': 310,
  'Nationality/headquarters': 'U.K',
  'Industry': 'music'}]

### Summary