## Reto 2: Regex

### 1. Objetivos:
    - Practicar expresiones regulares con un conjunto de datos real
 
---
    
### 2. Desarrollo:

Vamos a practicar expresiones regulares utilizando un conjunto de datos llamado 'amazon_fine_food_reviews-clean.csv'. Este conjunto de datos es en realidad un subconjunto de un conjunto más grande que proviene de [esta fuente](https://www.kaggle.com/snap/amazon-fine-food-reviews). Contiene evaluaciones de muchos diversos productos realizadas por usuarios de Amazon. La columna 'text' contiene el texto de la evaluación, y ésa es la columna que nos interesa.

Vamos a practicar expresiones regulares con esa columna. Con cada búsqueda que realices vas a obtener un nuevo subconjunto de datos de un tamaño específico. Al terminar tus búsquedas compara el tamaño de tus subconjuntos de datos con los de tus compañeros, para checar que tus respuestas fueron correctas.

Tu Reto es entonces obtener subconjunto de datos que tengan estas características:

1. Todas las evaluaciones que contengan la palabra 'food' (en minúsculas).
2. Todas las evaluaciones que contengan algún número de dos digitos.
3. Todas las evaluaciones que contengan algún porcentaje (uno o más digitos seguidos de un signo de porcentaje).
4. Todas las evaluaciones que comiencen con la palabra 'Dog' o 'dog'.
5. Todas las evaluaciones que terminen con el fragmento 'awesome.' (fíjate que hay específicamente un punto después de la palabra 'awesome').
6. Todas las evaluaciones que contengan las palabras 'horrible' **o** 'terrible'.
7. Todas las evaluaciones que contengan **solamente** letras minúsculas.

Después de realizar estas exploraciones, limpia tu conjunto de datos para remover lo siguiente de todos tus textos:

1. Cualquier forma parecida a la siguiente: `<br>` o `<br/>` (revisa variaciones de estos tags, con espacios intermedios, por ejemplo)
2. Signos en general
3. Digitos
4. Cualquier otra cosa que no te parezca relevante para nuestro análisis de lenguaje natural

También convierte todas las letras en minúsculas para homogeneizar nuestro conjunto de datos.

Guarda tu conjunto de datos como un archivo 'csv' para que lo puedas utilizar en los próximos retos (asegúrate de incluir **por lo menos** las columnas 'text' y 'score'.

In [50]:
import pandas as pd
import re

In [51]:
df = pd.read_csv('../../Datasets/amazon_fine_food_reviews-clean.csv')

df.head()

Unnamed: 0,id,product_id,user_id,profile_name,helpfulness_numerator,helpfulness_denominator,score,time,summary,text
0,258510,B00168V34W,A1672LH9S1XO70,"Lorna J. Loomis ""Canadian Dog Fancier""",13,14,3,1266796800,"Misleading to refer to ""PODS""","This coffee does NOT come in individual ""PODS""..."
1,207915,B000CQID2Y,A42CJC66XO0H7,"Scott Schimmel ""A Butterfly Dreaming""",2,2,5,1279497600,Delicious,I was a little skeptical after looking at the ...
2,522649,B007TJGZ0Y,A16QZBG2UN6Z3X,"Toology ""Toology""",0,0,5,1335830400,One of my favs,Gloia Jeans Butter Toffee is one of my favorit...
3,393368,B000W7PUOW,A3J21CQZG60K35,Hsieh Pei Hsuan,2,2,5,1265673600,Tasty!!,My families and friends love Planters peanuts ...
4,178178,B002FX2IOQ,A1Z7XV6JU0EV8M,"Barbara ""Barbara""",1,6,1,1301788800,"Organic Valley White 1 % Milkfat Lowfat Milk, ...","Organic Valley White 1 % Milkfat Lowfat Milk, ..."


In [68]:
columna = df[['text','score']]
columna

Unnamed: 0,text,score
0,"This coffee does NOT come in individual ""PODS""...",3
1,I was a little skeptical after looking at the ...,5
2,Gloia Jeans Butter Toffee is one of my favorit...,5
3,My families and friends love Planters peanuts ...,5
4,"Organic Valley White 1 % Milkfat Lowfat Milk, ...",1
...,...,...
14206,This tea certainly exceeded my expectations! ...,5
14207,I had these at a conference once. I have been ...,4
14208,I have enjoyed using the maple flavor. It adds...,5
14209,"When I recently started using K-cups, wasn't s...",5


In [61]:
columna[columna['text'].str.contains('food', case=True)]

Unnamed: 0,text,score
5,I adopted a rescue dog who had an allergy to c...,5
18,My cat Mack has a very sensitive stomach and w...,5
24,This has got to be one of the great superfoods...,5
35,My cats are on a diet of super high quality dr...,5
40,I've lived in apartments with no outdoor space...,5
...,...,...
14172,I am always looking for good quality cat food ...,2
14195,"Quinoa adapts to a variety of seasonings, from...",5
14198,This is a nice freezer tray. The base is so su...,4
14203,I started out buying this to use in my homemad...,5


In [62]:
columna[columna['text'].str.contains('[0-9][0-9]')]

Unnamed: 0,text,score
4,"Organic Valley White 1 % Milkfat Lowfat Milk, ...",1
5,I adopted a rescue dog who had an allergy to c...,5
10,I have a one year old Pomeranian whom has been...,5
11,This thing makes a great present for any choco...,5
12,This has been a great find. My grandmother pas...,5
...,...,...
14194,Rcv'd on 20/Nov/06 product w expiry date of 23...,3
14196,"Once when driving on I 95 north, near New York...",5
14205,"Its awesome, perfect. just like the fair! grea...",5
14207,I had these at a conference once. I have been ...,4


In [63]:
columna[columna['text'].str.contains('[0-9]+%')]

Unnamed: 0,text,score
22,I just finished my last Vita Coco that I odere...,5
69,"We use this coffee in a fully automatic ""Saeco...",5
146,I love these Blue Diamond Almond snacks and I ...,5
155,"I was surprised at the ingredient list and ""10...",2
169,I went through 12 cans in about a week. Soooo...,5
...,...,...
13973,This is a really good choice for a nutritional...,5
14020,We mostly eat healthy foods--I'm a vegetarian ...,4
14167,"I've ordered literally hundreds, probably into...",5
14191,I thought I'd give Blue Horse's 100% Kona Coff...,5


In [64]:
columna[columna['text'].str.contains('^Dog',case=False)]

Unnamed: 0,text,score
755,Dogs LOVE Greenies! They go crazy if they eve...,5
4853,"Dogs liked it ""okay."" Did NOT give it th the ...",2
9693,Dog loves these. Eats them quickly and without...,4
13252,Dogs probably don't really want to spend most ...,5
14131,Dogs love it. It smells like chicken. I have b...,5


In [65]:
columna[columna['text'].str.contains('Awesome.$', case=False)]

Unnamed: 0,text,score
396,I love Tiger Sauce! I eat it with cream cheese...,5
1304,"If you want to improve your milk production, t...",3
5437,I've always loved Beetlejuice and it's no surp...,5
6340,I have two extremely picky eaters. They truly ...,5
7359,This is a deliscious drink.I have trouble keep...,5
8340,My boyfriend and I have been trying various bo...,5
8659,"I am a frequent purchaser on Amazon, yet I can...",5
10035,"These are decent, but if you want a realllly g...",3
10341,"these are awesome. though they're gluten free,...",5
10565,I had never heard of Dende Oil before I read a...,5


In [66]:
columna[columna['text'].str.contains('horrible|terrible',case=False)]

Unnamed: 0,text,score
355,"Within our family, my husband has diabetic neu...",2
409,This tea is terrible; cheap store tea bags tas...,1
441,"My lab has a very sensitive stomach, and for t...",5
559,These Beans are wonderful. I honestly recommen...,5
656,Every once in awhile I like to try different f...,1
...,...,...
13820,These are not chips. Chips implies a crunchine...,1
13847,I had major issues with Amazon.com in regards ...,5
13908,I became interested in ACV when a coworker men...,5
13999,I just bought these for my daughter today at o...,5


In [58]:
columna[columna['text'].str.contains('^[a-z]+$')]

Unnamed: 0,text,score


In [69]:
columna['text'] = columna['text'].str.replace('[|\^&+\-%*/=!>]','')
columna

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  columna['text'] = columna['text'].str.replace('[|\^&+\-%*/=!>]','')


Unnamed: 0,text,score
0,"This coffee does NOT come in individual ""PODS""...",3
1,I was a little skeptical after looking at the ...,5
2,Gloia Jeans Butter Toffee is one of my favorit...,5
3,My families and friends love Planters peanuts ...,5
4,"Organic Valley White 1 Milkfat Lowfat Milk, 8...",1
...,...,...
14206,This tea certainly exceeded my expectations W...,5
14207,I had these at a conference once. I have been ...,4
14208,I have enjoyed using the maple flavor. It adds...,5
14209,"When I recently started using Kcups, wasn't su...",5


In [48]:
columna

0        This coffee does NOT come in individual "PODS"...
1        I was a little skeptical after looking at the ...
2        Gloia Jeans Butter Toffee is one of my favorit...
3        My families and friends love Planters peanuts ...
4        Organic Valley White 1  Milkfat Lowfat Milk, 8...
                               ...                        
14206    This tea certainly exceeded my expectations  W...
14207    I had these at a conference once. I have been ...
14208    I have enjoyed using the maple flavor. It adds...
14209    When I recently started using Kcups, wasn't su...
14210    I ordered these as I have ordered from Amazon ...
Name: text, Length: 14211, dtype: object