## Reto 2: Regex

### 1. Objetivos:
    - Practicar expresiones regulares con un conjunto de datos real
 
---
    
### 2. Desarrollo:

Vamos a practicar expresiones regulares utilizando un conjunto de datos llamado 'amazon_fine_food_reviews-clean.csv'. Este conjunto de datos es en realidad un subconjunto de un conjunto más grande que proviene de [esta fuente](https://www.kaggle.com/snap/amazon-fine-food-reviews). Contiene evaluaciones de muchos diversos productos realizadas por usuarios de Amazon. La columna 'text' contiene el texto de la evaluación, y ésa es la columna que nos interesa.

Vamos a practicar expresiones regulares con esa columna. Con cada búsqueda que realices vas a obtener un nuevo subconjunto de datos de un tamaño específico. Al terminar tus búsquedas compara el tamaño de tus subconjuntos de datos con los de tus compañeros, para checar que tus respuestas fueron correctas.

Tu Reto es entonces obtener subconjunto de datos que tengan estas características:

1. Todas las evaluaciones que contengan la palabra 'food' (en minúsculas).
2. Todas las evaluaciones que contengan algún número de dos digitos.
3. Todas las evaluaciones que contengan algún porcentaje (uno o más digitos seguidos de un signo de porcentaje).
4. Todas las evaluaciones que comiencen con la palabra 'Dog' o 'dog'.
5. Todas las evaluaciones que terminen con el fragmento 'awesome.' (fíjate que hay específicamente un punto después de la palabra 'awesome').
6. Todas las evaluaciones que contengan las palabras 'horrible' **o** 'terrible'.
7. Todas las evaluaciones que contengan **solamente** letras minúsculas.

Después de realizar estas exploraciones, limpia tu conjunto de datos para remover lo siguiente de todos tus textos:

1. Cualquier forma parecida a la siguiente: `<br>` o `<br/>` (revisa variaciones de estos tags, con espacios intermedios, por ejemplo)
2. Signos en general
3. Digitos
4. Cualquier otra cosa que no te parezca relevante para nuestro análisis de lenguaje natural

También convierte todas las letras en minúsculas para homogeneizar nuestro conjunto de datos.

Guarda tu conjunto de datos como un archivo 'csv' para que lo puedas utilizar en los próximos retos (asegúrate de incluir **por lo menos** las columnas 'text' y 'score'.

In [1]:
import pandas as pd
import re

In [2]:
df = pd.read_csv('../Datasets/amazon_fine_food_reviews-clean.csv')

df.head()

Unnamed: 0,id,product_id,user_id,profile_name,helpfulness_numerator,helpfulness_denominator,score,time,summary,text
0,258510,B00168V34W,A1672LH9S1XO70,"Lorna J. Loomis ""Canadian Dog Fancier""",13,14,3,1266796800,"Misleading to refer to ""PODS""","This coffee does NOT come in individual ""PODS""..."
1,207915,B000CQID2Y,A42CJC66XO0H7,"Scott Schimmel ""A Butterfly Dreaming""",2,2,5,1279497600,Delicious,I was a little skeptical after looking at the ...
2,522649,B007TJGZ0Y,A16QZBG2UN6Z3X,"Toology ""Toology""",0,0,5,1335830400,One of my favs,Gloia Jeans Butter Toffee is one of my favorit...
3,393368,B000W7PUOW,A3J21CQZG60K35,Hsieh Pei Hsuan,2,2,5,1265673600,Tasty!!,My families and friends love Planters peanuts ...
4,178178,B002FX2IOQ,A1Z7XV6JU0EV8M,"Barbara ""Barbara""",1,6,1,1301788800,"Organic Valley White 1 % Milkfat Lowfat Milk, ...","Organic Valley White 1 % Milkfat Lowfat Milk, ..."


In [3]:
df2 = df.groupby("id")["text"].last()
df2.head()

id
58    It is chocolate, what can I say.  Great variet...
61    Watch your prices with this.  While the assort...
73    I ordered two of these and two of raspberry la...
86    We have three dogs and all of them love this f...
94    My golden retriever is one of the most picky d...
Name: text, dtype: object

In [4]:
df2[df2.str.contains("food", case=True)]

id
86        We have three dogs and all of them love this f...
94        My golden retriever is one of the most picky d...
211       I started my cat on Felidae Platinum about 3 w...
214       As with canidae, Felidae has also changed thei...
238       The recommendation when we bought our puppies ...
                                ...                        
567926    Fish has that all important DHA that helps bra...
567928    This has been my baby's (now 21 months) favori...
568001    My 18+ month old enjoys this food very much. T...
568024    I have been making my son's baby food but need...
568062    Very surprised.  Expected it to be thicker.  M...
Name: text, Length: 1774, dtype: object

Todas las evaluaciones que contengan algún número de dos digitos.


In [5]:
df2[df2.str.contains("[0-9][0-9]")]

id
94        My golden retriever is one of the most picky d...
238       The recommendation when we bought our puppies ...
369       "Nantucket Blend coffee is one of my favorites...
664       We have been using 17-Day Diet guided by Low-G...
673       I am very disappointed with this product becau...
                                ...                        
567928    This has been my baby's (now 21 months) favori...
568001    My 18+ month old enjoys this food very much. T...
568110    Absolutely delicious.  My dad bought these Pir...
568141    I have always bought Starbucks coffee from a s...
568215    I was very satisfied with my purchase of caram...
Name: text, Length: 3019, dtype: object

In [7]:
#Todas las evaluaciones que contengan algún porcentaje (uno o más digitos seguidos de un signo de porcentaje

df2[df2.str.contains("[0-9]%")]

id
664       We have been using 17-Day Diet guided by Low-G...
1322      This cocoa powder has a deep, fruity taste- I ...
2154      These are awesome and if you know where to fin...
3596      Over 90% of dry dog food is very unhealthy and...
5610      A 25% price increase since the end of November...
                                ...                        
564046    I purchased this item after reading through th...
565533    Love the taste, first off!  I have only tried ...
566074    I love peanut butter but hate all the calories...
566387    There are basically two ways to look at Kraft ...
566427    We switched a long time ago to only eating 100...
Name: text, Length: 388, dtype: object

In [13]:
# Todas las evaluaciones que comiencen con la palabra 'Dog' o 'dog'.

df2[df2.str.contains("^Dog|^dog")]

id
21456     Dogs LOVE Greenies!  They go crazy if they eve...
89608     Dogs liked it "okay."  Did NOT give it th the ...
160700    Dogs probably don't really want to spend most ...
255027    Dog loves these. Eats them quickly and without...
502312    Dogs love it. It smells like chicken. I have b...
Name: text, dtype: object

In [24]:
# Todas las evaluaciones que terminen con el fragmento 'awesome.' (fíjate que hay específicamente un punto después de la palabra 'awesome').

df2[df2.str.contains("awesome.$")]

id
101543    The toy seems pretty durable which is a big wi...
109285    If you want to improve your milk production, t...
152625    I love Tiger Sauce! I eat it with cream cheese...
263973    I am a frequent purchaser on Amazon, yet I can...
265598    I had never heard of Dende Oil before I read a...
312084    these are awesome. though they're gluten free,...
327377    My boyfriend and I have been trying various bo...
466528    These are decent, but if you want a realllly g...
531750    This is a deliscious drink.I have trouble keep...
Name: text, dtype: object

In [31]:
# Todas las evaluaciones que contengan las palabras 'horrible' o 'terrible'.
df2[df2.str.contains("horrible|terrible")]

id
194       These little guys are tasty and refreshing.<br...
214       As with canidae, Felidae has also changed thei...
2737      This is  terrible popcorn! No taste, poor popp...
3957      I love the product.  But, the bottle design is...
9384      I thought this coffee was really horrible. Had...
                                ...                        
545651    I first purchased this Danielle product at all...
546846    Our 6-year old Vizsla has been eating Canidae ...
554686    I orders this fruitcake from amazon.I thought ...
559328    I have eaten the varieties of these for years ...
561327    I read all the other reviews...I'm in the camp...
Name: text, Length: 216, dtype: object

In [18]:
# Todas las evaluaciones que contengan solamente letras minúsculas.
df2[df2.str.contains("^[a-z]")]

id
1801      they taste awesome! esp the salt n pepper one!...
2805      bar harbor clam chowder is an excellent produc...
3568      very good corn product, we will be eating this...
6485      tastes like dry brownie w/ invisible pieces of...
6500      they are fabulous and not bad for you, what el...
                                ...                        
563570    i love this cookie it was so soft and chewy an...
565492    start every morning w/click, 4oz ff milk and d...
566027    read about manuka honey online, and have bough...
566735    this is the best product i have ever used its ...
567754    i have tried many brands of the hot chocolates...
Name: text, Length: 484, dtype: object