# (5): Updating Rows and Columns - Modifying Data Within DataFrames

#### In this Python Programming, we will be learning how to modify the data within our DataFrames. We will use some of the filtering techniques we learned in the last video to update values conditionally, and we will also be learning how to use the apply, map, and applymap method. Let's get started...

In [351]:
people = {
"first": ["Corey","Jane","Jhon"],
"last": ["Schafer","Doe","Doe"],
"email": ["CoreyMSchager@gmail.com","loco@gamil.com","lineal@gmail.com"],
}

In [352]:
import pandas as pd

In [353]:
df = pd.DataFrame(people)

In [354]:
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchager@gmail.com
1,Jane,Doe,loco@gamil.com
2,Jhon,Doe,lineal@gmail.com


In [355]:
df.columns

Index(['first', 'last', 'email'], dtype='object')

In [356]:
df.columns = ['first_name', 'last_name', 'email']

In [357]:
df

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreyMSchager@gmail.com
1,Jane,Doe,loco@gamil.com
2,Jhon,Doe,lineal@gmail.com


In [358]:
df.columns = [x.lower() for x in df.columns]
df

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreyMSchager@gmail.com
1,Jane,Doe,loco@gamil.com
2,Jhon,Doe,lineal@gmail.com


In [359]:
df.columns = df.columns.str.replace(' ', "_")
df

Unnamed: 0,first_name,last_name,email
0,Corey,Schafer,CoreyMSchager@gmail.com
1,Jane,Doe,loco@gamil.com
2,Jhon,Doe,lineal@gmail.com


In [360]:
df.rename(columns={'first_name': 'first', 'last_name': 'last'}, inplace=True)

In [361]:
df.loc[2] = ['Imanol','Aguer','imanolaguer1@gmail.com']
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchager@gmail.com
1,Jane,Doe,loco@gamil.com
2,Imanol,Aguer,imanolaguer1@gmail.com


In [362]:
df.loc[2, ['last','email']] = ['Doe','lineal@gmail.com']
df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchager@gmail.com
1,Jane,Doe,loco@gamil.com
2,Imanol,Doe,lineal@gmail.com


In [363]:
# This is a Common error when try to change data
filt = (df['email'] == 'lineal@gmail.com')
df[filt]['last'] = 'Smith'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[filt]['last'] = 'Smith'


In [364]:
# Asi se soluciona... con La manera correcta de modificar data en el DataFrame
filt = (df['email'] == 'lineal@gmail.com')
df.loc[filt, 'last'] = ['Pnachito']

In [365]:
df['email'] = df['email'].str.lower()
df

Unnamed: 0,first,last,email
0,Corey,Schafer,coreymschager@gmail.com
1,Jane,Doe,loco@gamil.com
2,Imanol,Pnachito,lineal@gmail.com


#### Recuerda estos metodos
##### 1- apply 2. map 3. applymap 4.replace

## apply method 

In [366]:
df['email'].apply(len)

0    23
1    14
2    16
Name: email, dtype: int64

In [367]:
def update_email(email):
    return email.upper()

In [368]:
df['email'] = df['email'].apply(update_email)
# df

In [369]:
df['email'] = df['email'].apply(lambda x: x.lower()) # Esta es una funcion Lambda
df

Unnamed: 0,first,last,email
0,Corey,Schafer,coreymschager@gmail.com
1,Jane,Doe,loco@gamil.com
2,Imanol,Pnachito,lineal@gmail.com


In [370]:
df.apply(len)

first    3
last     3
email    3
dtype: int64

In [371]:
df.apply(pd.Series.min)

first                      Corey
last                         Doe
email    coreymschager@gmail.com
dtype: object

### applymap method

In [372]:
df.applymap(len) # <---- te dara Len() de cada elemento individual del DataFrame

Unnamed: 0,first,last,email
0,5,7,23
1,4,3,14
2,6,8,16


In [373]:
df.applymap(str.upper)

Unnamed: 0,first,last,email
0,COREY,SCHAFER,COREYMSCHAGER@GMAIL.COM
1,JANE,DOE,LOCO@GAMIL.COM
2,IMANOL,PNACHITO,LINEAL@GMAIL.COM


### map method 

In [374]:
df['first'].map({'Corey': 'Lionel','Jane': 'Cristiano'}) #atencion: no pone los cambios pernanentes

0       Lionel
1    Cristiano
2          NaN
Name: first, dtype: object

In [375]:
df['first'] = df['first'].replace({'Corey': 'Lionel', 'Jane':'Cristiano'}) #Atencion: aqui si los cambios quedan
df

Unnamed: 0,first,last,email
0,Lionel,Schafer,coreymschager@gmail.com
1,Cristiano,Doe,loco@gamil.com
2,Imanol,Pnachito,lineal@gmail.com


# Working with Stackoverflow Data 

In [376]:
dfs = pd.read_csv('../data/survey_results_public.csv')
schema_dfs = pd.read_csv('../data/survey_results_schema.csv')

In [377]:
dfs.head(20)

Unnamed: 0,ResponseId,MainBranch,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,...,TimeSearching,TimeAnswering,Onboarding,ProfessionalTech,TrueFalse_1,TrueFalse_2,TrueFalse_3,SurveyLength,SurveyEase,ConvertedCompYearly
0,1,None of these,,,,,,,,,...,,,,,,,,,,
1,2,I am a developer by profession,"Employed, full-time",Fully remote,Hobby;Contribute to open-source projects,,,,,,...,,,,,,,,Too long,Difficult,
2,3,"I am not primarily a developer, but I write co...","Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Friend or family member...,Technical documentation;Blogs;Programming Game...,,14.0,...,,,,,,,,Appropriate in length,Neither easy nor difficult,40205.0
3,4,I am a developer by profession,"Employed, full-time",Fully remote,I don’t code outside of work,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Books / Physical media;School (i.e., Universit...",,,20.0,...,,,,,,,,Appropriate in length,Easy,215232.0
4,5,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Stack Overflow;O...,,8.0,...,,,,,,,,Too long,Easy,
5,6,"I am not primarily a developer, but I write co...","Student, full-time",,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Books / Physical media;School (i.e., Universit...",,,15.0,...,,,,,,,,Appropriate in length,Easy,
6,7,I code primarily as a hobby,"Student, part-time",,,"Secondary school (e.g. American high school, G...","Other online resources (e.g., videos, blogs, f...",Stack Overflow;Video-based Online Courses,,3.0,...,,,,,,,,Appropriate in length,Easy,
7,8,I am a developer by profession,"Not employed, but looking for work",,,Some college/university study without earning ...,Online Courses or Certification,,Coursera;Udemy,1.0,...,,,,,,,,Appropriate in length,Easy,
8,9,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",I don’t code outside of work,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",On the job training;Coding Bootcamp,,,6.0,...,15-30 minutes a day,Over 120 minutes a day,Somewhat long,Innersource initiative;DevOps function;Microse...,Yes,Yes,Yes,Appropriate in length,Easy,49056.0
9,10,I am a developer by profession,"Independent contractor, freelancer, or self-em...",Fully remote,Hobby;Contribute to open-source projects;Boots...,Some college/university study without earning ...,Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,,37.0,...,,,,,,,,Appropriate in length,Easy,


In [378]:
dfs['Currency']

0                              NaN
1             CAD\tCanadian dollar
2              GBP\tPound sterling
3          ILS\tIsraeli new shekel
4        USD\tUnited States dollar
                   ...            
73263    USD\tUnited States dollar
73264    USD\tUnited States dollar
73265    USD\tUnited States dollar
73266          GBP\tPound sterling
73267                          NaN
Name: Currency, Length: 73268, dtype: object

In [379]:
dfs.rename(columns={'Currency': 'LocalCurrency'}, inplace=True)

In [380]:
dfs['LocalCurrency']

0                              NaN
1             CAD\tCanadian dollar
2              GBP\tPound sterling
3          ILS\tIsraeli new shekel
4        USD\tUnited States dollar
                   ...            
73263    USD\tUnited States dollar
73264    USD\tUnited States dollar
73265    USD\tUnited States dollar
73266          GBP\tPound sterling
73267                          NaN
Name: LocalCurrency, Length: 73268, dtype: object

In [381]:
dfs['SurveyEase']

0                               NaN
1                         Difficult
2        Neither easy nor difficult
3                              Easy
4                              Easy
                    ...            
73263                          Easy
73264                          Easy
73265                          Easy
73266                          Easy
73267                          Easy
Name: SurveyEase, Length: 73268, dtype: object

In [382]:
dfs['SurveyEase'] = dfs['SurveyEase'].map({'Easy': 'Facil', 'Difficult': 'Dificil', 'Neither easy nor difficult':'ni pedos'})
# A veces es conveniente usar metodo .replace en vez de .map

In [383]:
dfs.head()

Unnamed: 0,ResponseId,MainBranch,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,...,TimeSearching,TimeAnswering,Onboarding,ProfessionalTech,TrueFalse_1,TrueFalse_2,TrueFalse_3,SurveyLength,SurveyEase,ConvertedCompYearly
0,1,None of these,,,,,,,,,...,,,,,,,,,,
1,2,I am a developer by profession,"Employed, full-time",Fully remote,Hobby;Contribute to open-source projects,,,,,,...,,,,,,,,Too long,Dificil,
2,3,"I am not primarily a developer, but I write co...","Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Friend or family member...,Technical documentation;Blogs;Programming Game...,,14.0,...,,,,,,,,Appropriate in length,ni pedos,40205.0
3,4,I am a developer by profession,"Employed, full-time",Fully remote,I don’t code outside of work,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Books / Physical media;School (i.e., Universit...",,,20.0,...,,,,,,,,Appropriate in length,Facil,215232.0
4,5,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Stack Overflow;O...,,8.0,...,,,,,,,,Too long,Facil,


# (Part 6) Add or Remove Rows and Columns From DataFrames 


### This is how we add columns to our DataFrame

In [384]:
df['first'] + " " + df['last']

0     Lionel Schafer
1      Cristiano Doe
2    Imanol Pnachito
dtype: object

In [385]:
df["Full_Name"] = df['first'] + " " + df['last'] # <--- Created a new column

In [386]:
df

Unnamed: 0,first,last,email,Full_Name
0,Lionel,Schafer,coreymschager@gmail.com,Lionel Schafer
1,Cristiano,Doe,loco@gamil.com,Cristiano Doe
2,Imanol,Pnachito,lineal@gmail.com,Imanol Pnachito


### This is how we remove columns to our DataFrame 

In [387]:
df.drop(columns=['first', 'last'], inplace=True) # <--- Borra las columnas first y last
df
# Hey, Imanol other thins!
# You can simply delete columns by using "del" function.
# For example: del df['full_name']

Unnamed: 0,email,Full_Name
0,coreymschager@gmail.com,Lionel Schafer
1,loco@gamil.com,Cristiano Doe
2,lineal@gmail.com,Imanol Pnachito


In [388]:
df['Full_Name'].str.split(" ", expand=True)

Unnamed: 0,0,1
0,Lionel,Schafer
1,Cristiano,Doe
2,Imanol,Pnachito


In [389]:
df[['first','last']] = df['Full_Name'].str.split(" ", expand=True)

In [390]:
df

Unnamed: 0,email,Full_Name,first,last
0,coreymschager@gmail.com,Lionel Schafer,Lionel,Schafer
1,loco@gamil.com,Cristiano Doe,Cristiano,Doe
2,lineal@gmail.com,Imanol Pnachito,Imanol,Pnachito


In [391]:
# df.drop(columns=['Full_Name'])

In [392]:
# df.append({'first': 'Tony'}) # Append was deprecated in new version of pandas, now are used concat
df = pd.concat([df, pd.DataFrame([{'first': 'Tony'}])]) # <-- Add single row / ignore index

In [393]:
df

Unnamed: 0,email,Full_Name,first,last
0,coreymschager@gmail.com,Lionel Schafer,Lionel,Schafer
1,loco@gamil.com,Cristiano Doe,Cristiano,Doe
2,lineal@gmail.com,Imanol Pnachito,Imanol,Pnachito
0,,,Tony,


## Second DataFrame... people2

In [394]:
people2 = {
    "first": ["George", "Martin"],
    "last": ["Hotz", "Shkreli"],
    "email": ["Hotz@gmail.com", "Shkreli@gamil.com"],
}
df2 = pd.DataFrame(people2)
df2

Unnamed: 0,first,last,email
0,George,Hotz,Hotz@gmail.com
1,Martin,Shkreli,Shkreli@gamil.com


In [395]:
df = pd.concat([df,df2], ignore_index=True, sort=False) # <--- This replaces the method append of min 11:30

In [396]:
df

Unnamed: 0,email,Full_Name,first,last
0,coreymschager@gmail.com,Lionel Schafer,Lionel,Schafer
1,loco@gamil.com,Cristiano Doe,Cristiano,Doe
2,lineal@gmail.com,Imanol Pnachito,Imanol,Pnachito
3,,,Tony,
4,Hotz@gmail.com,,George,Hotz
5,Shkreli@gamil.com,,Martin,Shkreli


In [397]:
df.drop(index=(3), inplace=True) # <--- remover un row individual
df

Unnamed: 0,email,Full_Name,first,last
0,coreymschager@gmail.com,Lionel Schafer,Lionel,Schafer
1,loco@gamil.com,Cristiano Doe,Cristiano,Doe
2,lineal@gmail.com,Imanol Pnachito,Imanol,Pnachito
4,Hotz@gmail.com,,George,Hotz
5,Shkreli@gamil.com,,Martin,Shkreli


In [398]:
df.drop(index=[4,5], inplace=True) # <--- remover un grupo de rows
df

Unnamed: 0,email,Full_Name,first,last
0,coreymschager@gmail.com,Lionel Schafer,Lionel,Schafer
1,loco@gamil.com,Cristiano Doe,Cristiano,Doe
2,lineal@gmail.com,Imanol Pnachito,Imanol,Pnachito


In [399]:
df

Unnamed: 0,email,Full_Name,first,last
0,coreymschager@gmail.com,Lionel Schafer,Lionel,Schafer
1,loco@gamil.com,Cristiano Doe,Cristiano,Doe
2,lineal@gmail.com,Imanol Pnachito,Imanol,Pnachito


In [406]:
df.drop(index=df[df['last'] == 'Doe'].index, inplace=True) # <--- Remover rows con un condicional
# Tambien recuerda que puedes almacenar el condicional en una variable y solo pasar la variable al drop index=[filt]
filt = df['last'] == 'Doe' # <-- Usando

In [407]:
df

Unnamed: 0,email,Full_Name,first,last
0,coreymschager@gmail.com,Lionel Schafer,Lionel,Schafer
2,lineal@gmail.com,Imanol Pnachito,Imanol,Pnachito
