# Manipulating text in DataFrames
Pandas calls it _manipulating textual data_, and text is one of the most predominant data types you will encounter in datasets besides integers, floats, and booleans.
Being able to process and manipulate text is useful when you need to normalize data in cells

In [1]:
# Load your dataframe
import pandas as pd
csv_url = "https://raw.githubusercontent.com/paiml/wine-ratings/main/wine-ratings.csv"
df = pd.read_csv(csv_url, index_col=0)
df.head()

Unnamed: 0,name,grape,region,variety,rating,notes
0,1000 Stories Bourbon Barrel Aged Batch Blue Ca...,,"Mendocino, California",Red Wine,91.0,"This is a very special, limited release of 100..."
1,1000 Stories Bourbon Barrel Aged Gold Rush Red...,,California,Red Wine,89.0,The California Gold Rush was a period of coura...
2,1000 Stories Bourbon Barrel Aged Gold Rush Red...,,California,Red Wine,90.0,The California Gold Rush was a period of coura...
3,1000 Stories Bourbon Barrel Aged Zinfandel 2013,,"North Coast, California",Red Wine,91.0,"The wine has a deep, rich purple color. An int..."
4,1000 Stories Bourbon Barrel Aged Zinfandel 2014,,California,Red Wine,90.0,Batch #004 is the first release of the 2014 vi...


In [4]:
# manipulate the variety to be R for red or W for white
df["variety_short"] = df["variety"].replace({"Red Wine": "R", "White Wine": "W"})
df.head()

Unnamed: 0,name,grape,region,variety,rating,notes,variety_short
0,1000 Stories Bourbon Barrel Aged Batch Blue Ca...,,"Mendocino, California",Red Wine,91.0,"This is a very special, limited release of 100...",R
1,1000 Stories Bourbon Barrel Aged Gold Rush Red...,,California,Red Wine,89.0,The California Gold Rush was a period of coura...,R
2,1000 Stories Bourbon Barrel Aged Gold Rush Red...,,California,Red Wine,90.0,The California Gold Rush was a period of coura...,R
3,1000 Stories Bourbon Barrel Aged Zinfandel 2013,,"North Coast, California",Red Wine,91.0,"The wine has a deep, rich purple color. An int...",R
4,1000 Stories Bourbon Barrel Aged Zinfandel 2014,,California,Red Wine,90.0,Batch #004 is the first release of the 2014 vi...,R


In [13]:
# with high confidence, split the region and keep only the last part
# warning! you could operate on the same column, or create a new one!
df["region_short"] = df["region"].str.split().str.get(-1)
df.query("region_short != 'California'").head()

Unnamed: 0,name,grape,region,variety,rating,notes,variety_short,region_short
7,12 Linajes Crianza 2014,,Spain,Red Wine,92.0,Red with violet hues. The aromas are very inte...,R,Spain
8,12 Linajes Reserva 2012,,Spain,Red Wine,94.0,"On the nose, a complex predominance of mineral...",R,Spain
9,14 Hands Cabernet Sauvignon 2010,,Washington,Red Wine,87.0,Concentrated aromas of dark stone fruits and t...,R,Washington
10,14 Hands Cabernet Sauvignon 2011,,Washington,Red Wine,89.0,Concentrated aromas of dark stone fruits and t...,R,Washington
11,14 Hands Cabernet Sauvignon 2015,,Washington,Red Wine,89.0,"The 14 Hands Cabernet Sauvignon is a rich, jui...",R,Washington
