# Pandas Fundamentals

>This notebook was created and executed by Raven-Alexa Dixon to showcase Pandas Foundations.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
movies = pd.read_csv('movie.csv')

# Understanding Your Data

In [3]:
movies

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


In [4]:
movies.dtypes

color                         object
director_name                 object
num_critic_for_reviews       float64
duration                     float64
director_facebook_likes      float64
actor_3_facebook_likes       float64
actor_2_name                  object
actor_1_facebook_likes       float64
gross                        float64
genres                        object
actor_1_name                  object
movie_title                   object
num_voted_users                int64
cast_total_facebook_likes      int64
actor_3_name                  object
facenumber_in_poster         float64
plot_keywords                 object
movie_imdb_link               object
num_user_for_reviews         float64
language                      object
country                       object
content_rating                object
budget                       float64
title_year                   float64
actor_2_facebook_likes       float64
imdb_score                   float64
aspect_ratio                 float64
m

In [5]:
movies.dtypes.value_counts()

float64    13
object     12
int64       3
dtype: int64

In [6]:
movies.info() #This shows how many rows of data there are (entries) and how many of those values are not null (Non-Null Count), 
              #which in turn, gives you an idea of how many values ARE null. This also shows the amount of memory used by the df.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4916 entries, 0 to 4915
Data columns (total 28 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   color                      4897 non-null   object 
 1   director_name              4814 non-null   object 
 2   num_critic_for_reviews     4867 non-null   float64
 3   duration                   4901 non-null   float64
 4   director_facebook_likes    4814 non-null   float64
 5   actor_3_facebook_likes     4893 non-null   float64
 6   actor_2_name               4903 non-null   object 
 7   actor_1_facebook_likes     4909 non-null   float64
 8   gross                      4054 non-null   float64
 9   genres                     4916 non-null   object 
 10  actor_1_name               4909 non-null   object 
 11  movie_title                4916 non-null   object 
 12  num_voted_users            4916 non-null   int64  
 13  cast_total_facebook_likes  4916 non-null   int64

# Selecting Columns

In [7]:
#Selecting a single column using the index operator.
movies['director_name']

0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4             Doug Walker
              ...        
4911          Scott Smith
4912                  NaN
4913     Benjamin Roberds
4914          Daniel Hsia
4915             Jon Gunn
Name: director_name, Length: 4916, dtype: object

In [8]:
#Selecting a single column using attribute access(also known as dot notation).
movies.director_name

0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4             Doug Walker
              ...        
4911          Scott Smith
4912                  NaN
4913     Benjamin Roberds
4914          Daniel Hsia
4915             Jon Gunn
Name: director_name, Length: 4916, dtype: object

In [10]:
# .loc and .iloc can be used to pull out a Series. .loc is used to pull by column name and .iloc is used to pull by position.
movies.loc[:,'director_name']

0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4             Doug Walker
              ...        
4911          Scott Smith
4912                  NaN
4913     Benjamin Roberds
4914          Daniel Hsia
4915             Jon Gunn
Name: director_name, Length: 4916, dtype: object

In [11]:
movies.iloc[:,1] # the number 1 pulls the second column, which is director name. Remeber Python starts at 0.

0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4             Doug Walker
              ...        
4911          Scott Smith
4912                  NaN
4913     Benjamin Roberds
4914          Daniel Hsia
4915             Jon Gunn
Name: director_name, Length: 4916, dtype: object

# Random Column Info

In [12]:
movies['director_name'].index

RangeIndex(start=0, stop=4916, step=1)

In [13]:
movies['director_name'].size

4916

In [14]:
movies['director_name'].dtype

dtype('O')

In [15]:
movies['director_name'].name

'director_name'

In [16]:
type(movies['director_name'])

pandas.core.series.Series

**IMPORTANT CELL BELOW**

In [17]:
#We know the director_name column dtype is object. We also know that the object type can be a mixture of data types.
#use this method to determine each unique type in this column.

movies["director_name"].apply(type).unique()

array([<class 'str'>, <class 'float'>], dtype=object)

In [22]:
#Depending on your dataset, using the sample method might provide better insight into your data as opposed to the head method
#because the first rows might be very different from subsequent rows.

#When you don't add a number in the parenthesis, you will just get 1 row of data.


movies.sample(5)

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
4857,Color,Mike Flanagan,125.0,87.0,59.0,9.0,Courtney Bell,35.0,,Drama|Horror|Mystery,...,136.0,English,USA,R,70000.0,2011.0,28.0,5.8,1.78,3000
636,Color,Timur Bekmambetov,393.0,105.0,335.0,911.0,Dominic Cooper,3000.0,37516013.0,Action|Fantasy|Horror,...,348.0,English,USA,R,69000000.0,2012.0,3000.0,5.9,2.35,98000
2862,Color,Abel Ferrara,48.0,99.0,220.0,599.0,Vincent Gallo,812.0,1227324.0,Crime|Drama,...,48.0,English,USA,R,12500000.0,1996.0,787.0,6.6,1.85,344
1504,Color,Rand Ravich,107.0,109.0,7.0,1000.0,Charlize Theron,40000.0,10654581.0,Drama|Sci-Fi|Thriller,...,260.0,English,USA,R,34000000.0,1999.0,9000.0,5.3,1.85,1000
727,Color,Robert Redford,96.0,170.0,0.0,380.0,Kristin Scott Thomas,19000.0,75370763.0,Drama|Romance|Western,...,263.0,English,USA,PG-13,60000000.0,1998.0,1000.0,6.5,1.85,0


In [24]:
movies.director_name.size #Shows how many values there are including non-null and null. The .count() method shows non-null values.

4916

In [25]:
movies.director_name.shape #Shows rows and columns, (columns is blank as this is as column from the df)

(4916,)

In [26]:
len(movies.director_name)

4916

In [27]:
movies.director_name.unique()

array(['James Cameron', 'Gore Verbinski', 'Sam Mendes', ...,
       'Scott Smith', 'Benjamin Roberds', 'Daniel Hsia'], dtype=object)

**IMPORTANT NOTE BELOW**

In [28]:
#The .count() method returns the number of NON-MISSING VALUES. NOT THE TOTAL COUNT OF ITEMS.
director = movies['director_name']
director.count()

4814

**VARIATIONS OF DESCRIBE**

In [30]:
#Describe using a numberical column:
fb_likes = movies["actor_1_facebook_likes"]
fb_likes.describe()

count      4909.000000
mean       6494.488491
std       15106.986884
min           0.000000
25%         607.000000
50%         982.000000
75%       11000.000000
max      640000.000000
Name: actor_1_facebook_likes, dtype: float64

In [31]:
#Describe using a column with an object data type:
director.describe()

count                 4814
unique                2397
top       Steven Spielberg
freq                    26
Name: director_name, dtype: object

In [32]:
#The .quantile() method calculates the quantile of numeric data. Note that if you pass in a scaler, you will get scalar output, 
#but if you pass in a list, the output is a pandas Series.

fb_likes.quantile(0.2)

510.0

In [33]:
fb_likes.quantile([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8])

0.1      240.0
0.2      510.0
0.3      694.0
0.4      854.0
0.5      982.0
0.6     1000.0
0.7     8000.0
0.8    13000.0
Name: actor_1_facebook_likes, dtype: float64

In [34]:
director.isna() #Returns a series, used to determine whether values are missing. Also, .notna() shows values that are not missing.

0       False
1       False
2       False
3       False
4       False
        ...  
4911    False
4912     True
4913    False
4914    False
4915    False
Name: director_name, Length: 4916, dtype: bool

**You can determine whether there are missing values in a Series by observing whether ot not the result from .count()
matches the result from .size**

In [36]:
fb_likes.count() #shows number of non-null entries, the output below means that we do have null values.

4909

In [35]:
#To replace all missing values with 0:

fb_likes_filled = fb_likes.fillna(0)
fb_likes_filled.count() #Remember, the .count() method returns non-null values only, this shows all nulls have been replaced.

4916

In [37]:
#To remove entries with missing/null values:
fb_likes_dropped = fb_likes.dropna()
fb_likes_dropped.size

4909

**A more direct approach to determining whether a column contains missing/null values:**

In [42]:
director.hasnans  #This is an attribute, no parenthesis needed. Use parenthesis for methods.********

True

>Fun .value_counts() Fact Below!

In [39]:
director.value_counts()

Steven Spielberg    26
Woody Allen         22
Martin Scorsese     20
Clint Eastwood      20
Ridley Scott        16
                    ..
John Putch           1
Luca Guadagnino      1
Sam Fell             1
Dan Fogelman         1
Daniel Hsia          1
Name: director_name, Length: 2397, dtype: int64

In [38]:
#When using the .value_counts() method, setting the normalize parameter = to True, returns the relative frequenices rather
#than the count.

director.value_counts(normalize=True)

Steven Spielberg    0.005401
Woody Allen         0.004570
Martin Scorsese     0.004155
Clint Eastwood      0.004155
Ridley Scott        0.003324
                      ...   
John Putch          0.000208
Luca Guadagnino     0.000208
Sam Fell            0.000208
Dan Fogelman        0.000208
Daniel Hsia         0.000208
Name: director_name, Length: 2397, dtype: float64

# Series Operations Section

In [43]:
imdb_score = movies['imdb_score']
imdb_score

0       7.9
1       7.1
2       6.8
3       8.5
4       7.1
       ... 
4911    7.7
4912    7.5
4913    6.3
4914    6.3
4915    6.6
Name: imdb_score, Length: 4916, dtype: float64

**Python Operators:**

In [44]:
imdb_score + 1

0       8.9
1       8.1
2       7.8
3       9.5
4       8.1
       ... 
4911    8.7
4912    8.5
4913    7.3
4914    7.3
4915    7.6
Name: imdb_score, Length: 4916, dtype: float64

In [51]:
imdb_score.add(1) #Equivilant to the above command.

0       8.9
1       8.1
2       7.8
3       9.5
4       8.1
       ... 
4911    8.7
4912    8.5
4913    7.3
4914    7.3
4915    7.6
Name: imdb_score, Length: 4916, dtype: float64

In [45]:
imdb_score * 2.5

0       19.75
1       17.75
2       17.00
3       21.25
4       17.75
        ...  
4911    19.25
4912    18.75
4913    15.75
4914    15.75
4915    16.50
Name: imdb_score, Length: 4916, dtype: float64

In [46]:
imdb_score // 7 # Double slash represents floor division. 

0       1.0
1       1.0
2       0.0
3       1.0
4       1.0
       ... 
4911    1.0
4912    1.0
4913    0.0
4914    0.0
4915    0.0
Name: imdb_score, Length: 4916, dtype: float64

In [48]:
imdb_score % 7

0       0.9
1       0.1
2       6.8
3       1.5
4       0.1
       ... 
4911    0.7
4912    0.5
4913    6.3
4914    6.3
4915    6.6
Name: imdb_score, Length: 4916, dtype: float64

>Comparison Operators

In [49]:
imdb_score > 7

0        True
1        True
2       False
3        True
4        True
        ...  
4911     True
4912     True
4913    False
4914    False
4915    False
Name: imdb_score, Length: 4916, dtype: bool

In [52]:
imdb_score.gt(7) #Equivilant to the above command.

0        True
1        True
2       False
3        True
4        True
        ...  
4911     True
4912     True
4913    False
4914    False
4915    False
Name: imdb_score, Length: 4916, dtype: bool

In [50]:
# (director is movies['director_name'])
director == 'James Cameron'

0        True
1       False
2       False
3       False
4       False
        ...  
4911    False
4912    False
4913    False
4914    False
4915    False
Name: director_name, Length: 4916, dtype: bool

**Note regarding imdb_score.add() and imdb_score.gt()**
>Why does pandas offer a method equivalent to these operators? By its nature, an operator only operates in exactly one manner. Methods, on the other hand, can have parameters that allow you to alter their default functionality.
*Example:* 
    The .sub() method performs subtraction on a Series. When you do subtraction with the (-) operator, missing values are ignored. However, the .sub() method allows you to specify a fill_value parameter to use in place of missing values.

In [54]:
numbers = pd.Series([100,20,None])
numbers - 15

0    85.0
1     5.0
2     NaN
dtype: float64

In [55]:
numbers.sub(15, fill_value=0)

0    85.0
1     5.0
2   -15.0
dtype: float64

# Chaining Series methods

In [56]:
director.value_counts().head(3) #Top 3 director counts.

Steven Spielberg    26
Woody Allen         22
Martin Scorsese     20
Name: director_name, dtype: int64

In [57]:
fb_likes.isna().sum() #Common way to count the number of missing values. Without the .sum() you get a long series.

7

>All the non-missing values of fb_likes should be integers as it is impossible to have a partial Facebook like. In most pandas versions, any numeric columns with missing values must have their data type as float (pandas 0.24 introduced the Int64 type, which supports missing values but is not used by default). If we fill missing values from fb_likes with zeros, we can then convert it to an integer with the .astype method:

In [60]:
fb_likes.dtype

dtype('float64')

In [61]:
fb_likes.fillna(0).astype(int).head()

0     1000
1    40000
2    11000
3    27000
4      131
Name: actor_1_facebook_likes, dtype: int32

**One potential downside of chaining is that debugging becomes difficult. Because none of the intermediate objects created during the method calls is stored in a variable, it can be hard to trace the exact location in the chain where it occurred.**

# Renaming column names

`The renamed DataFrame method accepts dictionaries that map the old value to the new value. Here's one for the columns:`

In [63]:
col_map = {
    'director_name': 'director',
    'num_critic_for_reviews': 'critic_reviews'
}

In [64]:
movies.rename(columns=col_map).head()

Unnamed: 0,color,director,critic_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0


**Cleaning up column names is important. Example below shows leading and trailing white space being removed, all characters 
set to lower case, and all spaces replaced with underscores.**

In [79]:
cols = [
     col.strip().lower().replace(" ", "_")
     for col in movies.columns
]
movies.columns = cols
movies.head(3)

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000


# Creating and deleting columns

`One way to create a new column is to do an index assignment.`

>> movies['has_seen'] = 0

*This would create a new column determing whether or not I have seen the movie. 1 for yes, 0 for no.*

**Note that this will not return a new DataFrame but mutate the existing DataFrame.** 

`If you assign the column to a scalar value,` **it will use that value for every cell in the column.**

*By default, new columns are appended to the end:*

`Using the .assign() method, will return a new Data Frame with a new column`
>> movies.assign(has_seen=0)

---

**There are several columns that contain data on the number of Facebook likes. I will add up all actor and director Facebook like columns and assign them to the total_likes column. This can be done in a couple of ways.**

(1.)We can add each of the columns:

*Note that this is not "in place"*

In [89]:
total = (
     movies["actor_1_facebook_likes"]
     + movies["actor_2_facebook_likes"]
     + movies["actor_3_facebook_likes"]
     + movies["director_facebook_likes"]
)

movies.assign(total_likes=total)

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes,total_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000,2791.0
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0,46563.0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000,11554.0
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000,95000.0
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,12.0,7.1,,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,English,Canada,,,2013.0,470.0,7.7,,84,1427.0
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,English,USA,TV-14,,,593.0,7.5,16.00,32000,
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,English,USA,,1400.0,2013.0,0.0,6.3,,16,0.0
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660,2154.0


In [93]:
movies #This is to show that the action was not 'in place'

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


(2.) I can chain, while calling .sum().

>I will pass in a list of columns and use .loc to pull out just those columns:

In [94]:
cols = [
     "actor_1_facebook_likes",
     "actor_2_facebook_likes",
     "actor_3_facebook_likes",
     "director_facebook_likes",
]
sum_col = movies.loc[:, cols].sum(axis="columns")
sum_col.head(5)

0     2791.0
1    46563.0
2    11554.0
3    95000.0
4      274.0
dtype: float64

In [95]:
movies.assign(total_likes=sum_col).head(5)

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes,total_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000,2791.0
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0,46563.0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000,11554.0
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000,95000.0
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,12.0,7.1,,0,274.0


**Note that when I called the (+) operator, the result had missing numbers (NaN), but the .sum() method ignores missing numbers by default, so I get a different result.**

>When numeric columns are added to one another using the plus operator, the result is NaN if there is any value missing. However, with the .sum() method it converts NaN to zero.


---

**Note:** `It is possible to insert a new column into a specific location in a DataFrame with the .insert() method. 

The .insert method takes the integer position of the new column as its first argument, the name of the new column as its second, and the values as its third. You will need to use the .get_loc Index method to find the integer location of the column name.`

**Keep in mind:**

`The .insert() method modifies the calling DataFrame in-place, so there won't be an assignment statement. It also returns None.`

# Dropping Columns

`Syntax to drop a column:   df.drop(columns='column_name')`


`An alternative to deleting columns with the .drop method is to use the del statement.`
*This does not return a new DataFrame, so favor .drop over this*

`Syntax:      del df['column_name']`