## Adding a Pandas Column with More Complicated Conditions

In [5]:
import pandas as pd
import numpy as np

### To read csv file from the website

In [2]:
df = pd.read_csv('https://www.dataquest.io/wp-content/uploads/2020/06/dataquest_tweets_csv.csv')

In [3]:
df

Unnamed: 0,date,time,tweet,mentions,photos,replies_count,retweets_count,likes_count
0,2020-06-29,09:36:13,"""Going from 15k to 170k in salary...There's no...",[],['https://pbs.twimg.com/media/EbrvY0SWoAcEQWn....,0,0,0
1,2020-06-26,14:14:44,"Hi Tom, sorry about that! We try to respond to...",['beingtomiwa'],[],0,0,0
2,2020-06-26,12:00:14,Become an @rstudio power user: https://bit.ly/...,['rstudio'],[],0,2,7
3,2020-06-26,11:23:21,Please get in touch with our support team by e...,['jimohkassim'],[],0,0,0
4,2020-06-26,02:00:05,Learn to master R markdown in this free tutori...,[],[],0,2,7
...,...,...,...,...,...,...,...,...
4524,2015-07-01,20:35:57,"@Ariella_CG Very strange, we'll look into this...",['ariella_cg'],[],1,0,0
4525,2015-06-26,09:50:38,@taylor_atx @keen_io We used it at a learning ...,"['taylor_atx', 'keen_io']",[],1,0,1
4526,2015-06-25,20:53:37,"@keen_io Thanks, we love keen.io (and are happ...",['keen_io'],[],2,0,1
4527,2015-06-22,04:30:10,Why image recognition still needs a lot of wor...,[],[],0,0,0


For example, to dig deeper into this question, we might want to create a few interactivity “tiers” and assess what percentage of tweets that reached each tier contained images. For simplicity’s sake, lets use Likes to measure interactivity, and separate tweets into four tiers:

* tier_4 — 2 or fewer likes
* tier_3 — 3-9 likes
* tier_2 — 10-15 likes
* tier_1 — 16+ likes

To accomplish this, we can use a function called np.select(). We’ll give it two arguments: a list of our conditions, and a correspding list of the value we’d like to assign to each row in our new column.

This means that the order matters: if the first condition in our conditions list is met, the first value in our values list will be assigned to our new column for that row. If the second condition is met, the second value will be assigned, et cetera.

In [6]:
# create a list of our conditions
conditions = [
    (df['likes_count'] <= 2),
    (df['likes_count'] > 2) & (df['likes_count'] <= 9),
    (df['likes_count'] > 9) & (df['likes_count'] <= 15),
    (df['likes_count'] > 15)
    ]

# create a list of the values we want to assign for each condition
values = ['tier_4', 'tier_3', 'tier_2', 'tier_1']

# create a new column and use np.select to assign values to it using our lists as arguments
df['tier'] = np.select(conditions, values)

# display updated DataFrame
df.head()

Unnamed: 0,date,time,tweet,mentions,photos,replies_count,retweets_count,likes_count,tier
0,2020-06-29,09:36:13,"""Going from 15k to 170k in salary...There's no...",[],['https://pbs.twimg.com/media/EbrvY0SWoAcEQWn....,0,0,0,tier_4
1,2020-06-26,14:14:44,"Hi Tom, sorry about that! We try to respond to...",['beingtomiwa'],[],0,0,0,tier_4
2,2020-06-26,12:00:14,Become an @rstudio power user: https://bit.ly/...,['rstudio'],[],0,2,7,tier_3
3,2020-06-26,11:23:21,Please get in touch with our support team by e...,['jimohkassim'],[],0,0,0,tier_4
4,2020-06-26,02:00:05,Learn to master R markdown in this free tutori...,[],[],0,2,7,tier_3
