The file is separated into four parts:
- some general functions/variable definitions for the data loading
- for each part a definition of the path of the different txt files, followed by the processing of the data

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
import pandas as pd
import os


dataset_path = '/content/drive/My Drive/AdaDataSet'

ADVOCATE = "advocate"
RATEBEER = "ratebeer"
MATCHED = "matched"

ADVOCATE_PATH = os.path.join(dataset_path,ADVOCATE)
RATEBEER_PATH = os.path.join(dataset_path,RATEBEER)
MATCHED_PATH = os.path.join(dataset_path,MATCHED)

**Functions to convert the txt file to a csv file**

The first function converts the txt file to a dictionnary.
The Second one directely converts it to a CSV file into small batches. It does so to avoid memory size issues with Google Colab

In [8]:
def parse_txt_file(file_path):
    data = []  # List to hold dictionaries for each record
    current_record = {}  # Dictionary to hold current record

    with open(file_path, 'r') as f:
        for line in f:
            line = line.strip()  # Remove leading/trailing whitespace
            if not line:  # If the line is empty, the record is complete
                if current_record:  # If we have data in current_record
                    data.append(current_record)  # Add record to the list
                    current_record = {}  # Reset for the next record
            else:
                # Only process the line if it contains a ': '
                if ': ' in line:
                    key, value = line.split(': ', 1)
                    current_record[key] = value  # Add key-value to current_record

    # To capture the last record if the file doesn't end with an empty line
    if current_record:
        data.append(current_record)

    return data


In [9]:
import csv

def parse_txt_file_to_csv(file_path, output_csv_path, batch_size=500):
    batch = []  # Temporary storage for each batch
    current_record = {}  # Dictionary to hold current record

    with open(file_path, 'r') as f, open(output_csv_path, 'w', newline='') as csvfile:
        writer = None

        for line in f:
            line = line.strip()
            if not line:
                if current_record:
                    batch.append(current_record)
                    current_record = {}

                # Write the batch to CSV if it reaches the batch size
                if len(batch) >= batch_size:
                    if writer is None:  # Initialize writer on first batch
                        writer = csv.DictWriter(csvfile, fieldnames=batch[0].keys())
                        writer.writeheader()
                    writer.writerows(batch)
                    batch.clear()  # Clear batch after writing

            elif ': ' in line:
                key, value = line.split(': ', 1)
                current_record[key] = value

        # Write any remaining records in the last batch
        if current_record:
            batch.append(current_record)
        if batch:
            if writer is None:
                writer = csv.DictWriter(csvfile, fieldnames=batch[0].keys())
                writer.writeheader()
            writer.writerows(batch)

    print(f"Data saved to {output_csv_path}")

**ADVOCATE DATA SET**

In [10]:
ratings_a = 'ratings-advocate.txt'
reviews_a = 'reviews-advocate.txt'

It is the only one that uses the first method, the dictionnary is then saved as a DataFrame befor beeing transfered to a CSV file.

In [None]:
records = parse_txt_file(os.path.join(ADVOCATE_PATH, reviews_a))

df = pd.DataFrame(records)
df.sample(10)

Unnamed: 0,beer_name,beer_id,brewery_name,brewery_id,style,abv,date,user_name,user_id,appearance,aroma,palate,taste,overall,rating,text
334789,Baltika #7 Export,4695,Baltika Breweries,401,Dortmunder / Export Lager,5.4,1150624800,ernie,ernie.79195,2.0,2.0,2.5,2.5,2.5,2.35,Another try to brew a Dortmunder Export for th...
1523606,Guard Dog Porter,66315,Marley's Brewery And Grille,24870,American Porter,5.6,1297767600,CaptinRedBeard,captinredbeard.413445,4.0,4.5,4.0,4.5,4.0,4.32,Pours a very dark brown almost black color. No...
2109691,New Holland The Poet,8322,New Holland Brewing Company,335,Oatmeal Stout,5.2,1277200800,Stacy53,stacy53.450433,4.0,3.0,3.5,4.0,4.0,3.71,New Holland- The Poet- Bottled in 2010Less tha...
23577,Bridge Of Allan Glencoe Organic Wild Oat Stout,22635,Bridge Of Allan Brewery Ltd,3396,Oatmeal Stout,4.5,1291806000,StJamesGate,stjamesgate.163714,3.0,4.0,4.0,3.5,4.0,3.74,Pours opaque black with oxblood highlights and...
2525865,Duchesse De Bourgogne,1745,Brouwerij Verhaeghe,641,Flanders Red Ale,6.0,1401962400,hagbergl,hagbergl.364039,4.25,4.5,3.75,4.5,4.25,4.36,750 mL green bottle into my Rodenbach Grand Cr...
324268,Spaten Oktoberfestbier Ur-Märzen,582,Spaten-Franziskaner-Bräu,142,Märzen / Oktoberfest,5.9,1227006000,Ruaidhri,ruaidhri.10786,4.0,3.0,3.0,4.0,4.0,3.66,Picked this beer up during Oktoberfest 08. Pou...
1572275,Man-O-Awe,111010,3 Floyds Brewing Co.,26,American Pale Ale (APA),6.0,1404813600,Rifugium,rifugium.304205,3.75,3.5,3.5,3.5,3.5,3.52,Thanks to Ray for the share!Hazy golden-orange...
1136547,He'Brew Funky Jewbelation '16,205937,Shmaltz Brewing Company,262,American Wild Ale,9.5,1471514400,ajilllau,ajilllau.1034911,3.0,2.5,2.25,2.25,3.0,2.51,This beer was a total miss for me. it tasted ...
300473,Mahr's Pilsner,2410,Mahr's Bräu,428,German Pilsener,4.9,1411207200,Chico1985,chico1985.349304,3.5,3.5,4.0,3.5,3.5,3.55,"Pre-gaming for a German beer tasting with, wel..."
1621968,Rolling Rock Extra Pale,567,Latrobe Brewing Co.,174,American Adjunct Lager,4.6,1137236400,nlmartin,nlmartin.31081,3.0,2.5,3.0,2.5,3.0,2.68,Poured from the bottle into a pub glass.Appear...


In [None]:
save_path = os.path.join(ADVOCATE_PATH, 'reviews-advocate.csv')
df.to_csv(save_path, index=False)


In [None]:
save_path = os.path.join(ADVOCATE_PATH, 'reviews-advocate.csv')
save_path

'/content/drive/My Drive/AdaDataSet/advocate/reviews-advocate.csv'

In [None]:
df = pd.read_csv(save_path)
df.sample(10)

Unnamed: 0,beer_name,beer_id,brewery_name,brewery_id,style,abv,date,user_name,user_id,appearance,aroma,palate,taste,overall,rating,text
2087672,Decadent Dark Chocolate Ale,78023,Atwater Brewery,15280,American Stout,4.5,1466244000,goodbeer4cheap,goodbeer4cheap.705186,4.25,4.5,3.0,3.75,3.75,3.89,Enjoyed from a beer mug poured from a bottle.L...
524463,W-n-B Coffee Oatmeal Imperial Stout,21950,Terrapin Beer Company,2372,American Double / Imperial Stout,9.4,1450350000,tbryan5,tbryan5.695552,4.25,4.25,3.75,4.5,4.5,4.35,Pours like a thick coffee. And it smells like ...
1242717,Trademark Pale Ale,15053,Breckenridge Brewery,2137,American Pale Ale (APA),5.7,1093428000,TastyTaste,tastytaste.335,3.0,3.0,4.0,4.0,3.5,3.6,Written From Notes Compiled 6/04-8/04:Light ca...
887634,Blazing Arrow,235141,Monkish Brewing Co.,28657,American Double / Imperial IPA,8.1,1466676000,CNoj012,cnoj012.904404,4.0,4.5,4.25,4.5,4.5,4.45,This beer pours a cloudy dark golden orange co...
242196,Arcobräu Urfass Premium Hell,20923,Arcobräu Gräfliches Brauhaus,5709,Dortmunder / Export Lager,5.2,1444644000,FLima,flima.688307,3.25,3.25,3.5,3.25,3.25,3.28,Bright golden color with a thin thick head wit...
2153984,Gone Fishin',51602,Beer Valley Brewing Co.,14984,English Dark Mild Ale,4.0,1277200800,Pecorasc,pecorasc.157575,3.0,4.0,4.5,4.0,5.0,4.19,22oz bottle poured into a tulip. A style I alw...
764950,Santa's Little Helper,34203,Port Brewing,13839,Russian Imperial Stout,10.5,1244455200,JMBSH,jmbsh.25234,3.5,4.0,3.5,3.5,3.5,3.62,A - Dark ruby with not much of a tan head (tho...
1903213,Hunahpu's Imperial Stout - Double Barrel Aged,110635,Cigar City Brewing,17981,American Double / Imperial Stout,11.0,1392721200,Alieniloquium,alieniloquium.277522,4.0,4.25,4.0,4.5,4.75,4.41,750 mL bottle poured into a snifter.Appearance...
2403498,Cantillon Cuvée Des Champions,21683,Brasserie Cantillon,388,Gueuze,5.0,1309082400,billshmeinke,billshmeinke.319944,3.0,4.0,4.5,4.0,4.5,4.09,Consumed at Stone Sour Fest.Murky golden color...
502708,Trepidation - Gin Barrel Aged,175046,Vintage Brewing Company,22243,Quadrupel (Quad),11.9,1441965600,Dactrius,dactrius.673749,3.75,4.25,4.25,4.25,4.25,4.22,The gin flavor isn't very intense and hasn't k...


In [12]:
parse_txt_file_to_csv(os.path.join(ADVOCATE_PATH, ratings_a), os.path.join(ADVOCATE_PATH, 'ratings-advocate.csv'), batch_size=500)

Data saved to /content/drive/My Drive/AdaDataSet/advocate/ratings-advocate.csv


In [14]:
df = pd.read_csv(os.path.join(ADVOCATE_PATH, 'ratings-advocate.csv'))
df.sample(10)

Unnamed: 0,beer_name,beer_id,brewery_name,brewery_id,style,abv,date,user_name,user_id,appearance,aroma,palate,taste,overall,rating,text,review
7723254,Reliquary,89539,Pipeworks Brewing Company,28178,Belgian IPA,9.0,1413540000,BrentSeifts,brentseifts.783922,,,,,,3.75,,False
6862400,Pilgrim's Dole,14954,New Holland Brewing Company,335,Wheatwine,12.0,1415185200,Commando,commando.750000,,,,,,4.0,,False
2330629,Racer 5 India Pale Ale,2751,Bear Republic Brewery,610,American IPA,7.5,1275127200,aerozeppl,aerozeppl.101084,4.0,3.5,4.0,4.0,4.5,3.98,I don't think I have had this beer since my re...,True
3756453,Avalanche Ale,2297,Breckenridge Brewery,2137,American Amber / Red Ale,4.4,1113559200,scottum,scottum.1401,4.0,3.0,4.0,3.5,4.0,3.56,A bottle of ale as per BIF16Pours out a sexy a...,True
3334759,Three Philosophers Belgian Style Blend (Quadru...,3457,Brewery Ommegang,42,Quadrupel (Quad),9.7,1321527600,freewill35,freewill35.168208,,,,,,4.0,,False
283250,Blanche De Chambly,31,Unibroue,22,Witbier,5.0,1445508000,adamranders,adamranders.991398,3.75,4.0,4.0,4.25,4.0,4.09,,False
8355964,Bourbon Barrel Aged Empire Imperial Stout,85513,Cutters Brewing Company,26521,American Double / Imperial Stout,10.0,1348308000,eyeenjoybeer,eyeenjoybeer.568927,4.5,5.0,4.0,4.5,4.0,4.47,This is the first barrel-aged offering from th...,True
5297923,90 Minute IPA,2093,Dogfish Head Brewings & Eats,64,American Double / Imperial IPA,9.0,1408183200,shiznit7,shiznit7.792715,,,,,,4.24,,False
7532454,Bourbon County Brand Coffee Stout,57747,Goose Island Beer Co.,1146,American Double / Imperial Stout,13.4,1482145200,tmryan21,tmryan21.776533,5.0,4.75,5.0,4.75,5.0,4.84,,False
1456335,A Little Sumpin' Sumpin' Ale,49789,Lagunitas Brewing Company,220,American Pale Wheat Ale,7.5,1347271200,Beercounter1,beercounter1.693658,,,,,,4.25,,False


**MATCHED DATA SET**

In [42]:
ratings_ba = 'ratings_ba.txt'
ratings_rb = 'ratings_rb.txt'

ratings_ba_with_text = 'ratings_with_text_ba.txt'
ratings_rb_with_text = 'ratings_with_text_rb.txt'


In [17]:
parse_txt_file_to_csv(os.path.join(MATCHED_PATH, ratings_ba), os.path.join(MATCHED_PATH, 'ratings_ba.csv'), batch_size=500)

Data saved to /content/drive/My Drive/AdaDataSet/matched/ratings_ba.csv


In [43]:
parse_txt_file_to_csv(os.path.join(MATCHED_PATH, ratings_rb), os.path.join(MATCHED_PATH, 'ratings_rb.csv'), batch_size=500)

Data saved to /content/drive/My Drive/AdaDataSet/matched/ratings_rb.csv


In [21]:
parse_txt_file_to_csv(os.path.join(MATCHED_PATH, ratings_ba_with_text), os.path.join(MATCHED_PATH, 'ratings_ba_with_text.csv'), batch_size=500)

Data saved to /content/drive/My Drive/AdaDataSet/matched/ratings_ba_with_text.csv


In [44]:
parse_txt_file_to_csv(os.path.join(MATCHED_PATH, ratings_rb_with_text), os.path.join(MATCHED_PATH, 'ratings_rb_with_text.csv'), batch_size=500)

Data saved to /content/drive/My Drive/AdaDataSet/matched/ratings_rb_with_text.csv


In [24]:
df = pd.read_csv(os.path.join(MATCHED_PATH, 'ratings_ba.csv'))
df.sample(10)

Unnamed: 0,beer_name,beer_id,brewery_name,brewery_id,style,abv,date,user_name,user_id,appearance,aroma,palate,taste,overall,rating,text,review
640697,Spectrally Macabre,246966,Inoculum Ale Works,43919,Gose,3.1,1476439200,joemochas,joemochas.262075,4.5,4.75,4.5,4.5,4.5,4.56,,False
392441,Black Jack Porter,1251,Left Hand Brewing Company,418,English Porter,6.8,1351072800,mcsauter82,mcsauter82.688630,,,,,,3.5,,False
372235,Newburgh Checkpoint Charlie,131786,Newburgh Brewing Company,29419,Berliner Weissbier,3.0,1419159600,Ralphs66,ralphs66.541714,3.25,3.25,3.25,3.25,3.25,3.25,,False
142996,Strawberry Harvest Lager,23505,Abita Brewing Co.,3,Fruit / Vegetable Beer,4.2,1341655200,fatbatcat,fatbatcat.115556,,,,,,4.0,,False
851342,India Pale Ale,65038,Westbrook Brewing Co.,24134,American IPA,6.8,1378807200,cjgiant,cjgiant.741623,4.25,4.0,3.75,4.0,4.0,3.99,"Drinking from can, used plastic cup for pour t...",True
349016,Riprap Baltic Porter,70830,Barrier Brewing Company,22928,Baltic Porter,8.1,1434794400,ljbonadonna92,ljbonadonna92.877055,3.75,3.5,4.5,4.25,4.0,4.02,Big roasted notes followed by burnt caramel. T...,True
109817,Big Buck Brown Ale,69648,Black Husky Brewing,23018,American Brown Ale,8.0,1479812400,Benish,benish.723269,4.0,4.0,3.75,3.75,3.75,3.83,Appearance: light brown with pouring but then ...,True
767204,Longboard Island Lager,5328,Kona Brewing Co.,579,American Pale Lager,4.6,1268046000,KTCamm,ktcamm.289047,3.5,3.5,3.5,4.0,4.5,3.9,"Reviewing from tasting notes, purchased at the...",True
744850,Sriracha Hot Stout,147262,Rogue Ales,132,Chile Beer,5.7,1425553200,McMatt7,mcmatt7.836109,4.0,3.25,3.75,3.25,3.5,3.4,,False
421457,Classique,91189,Stillwater Artisanal Ales,22150,Saison / Farmhouse Ale,4.5,1378634400,dcguitar,dcguitar.343375,,,,,,3.75,,False


In [46]:
df = pd.read_csv(os.path.join(MATCHED_PATH, 'ratings_rb.csv'))
df.head(10)

Unnamed: 0,beer_name,beer_id,brewery_name,brewery_id,style,abv,date,user_name,user_id,appearance,aroma,palate,taste,overall,rating,text
0,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1429178400,ciaranc,151109,3,7,2,8,17,3.7,"Bottle, gift from Aaron. Black, big head, lots..."
1,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1427796000,Rowlymo,198957,4,7,4,8,15,3.8,500ml Bottle in Bittles bar Belfast. Chocolate...
2,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1421665200,Don2711,285162,4,8,5,7,17,4.1,Great one and made very local to me. Bottle bo...
3,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1380621600,genegenie,224156,5,8,3,5,19,4.0,"Deep roasted aroma, good dark brown/black colo..."
4,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1371549600,kiwianer,88501,4,7,4,7,14,3.6,"The head is medium, the body black. It smells ..."
5,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1321527600,Beersiveknown,128086,5,7,5,7,16,4.0,Cask@ Belfast beer fest. Dark brown black with...
6,Ards Pig Island Pale Ale,160664,Ards Brewing Co.,13538,Bitter,5.2,1382436000,Beersiveknown,128086,4,7,4,5,13,3.3,"Bottle at Bittles Bar, BelfastHazy amber brown..."
7,Ards Pig Island Pale Ale,160664,Ards Brewing Co.,13538,Bitter,5.2,1371549600,kiwianer,88501,4,7,3,5,14,3.3,"The head is stable, the body golden, orange. I..."
8,Ards Pig Island Pale Ale,160664,Ards Brewing Co.,13538,Bitter,5.2,1385550000,genegenie,224156,5,8,4,9,18,4.4,Hard to find but worth it when you do. A refre...
9,Ards Pig Island Pale Ale,160664,Ards Brewing Co.,13538,Bitter,5.2,1339581600,visionthing,91324,4,7,4,7,14,3.6,"50cl bottle (4,8% ABV) at Bittles Bar, Belfast..."


In [26]:
df = pd.read_csv(os.path.join(MATCHED_PATH, 'ratings_ba_with_text.csv'))
df.sample(10)

Unnamed: 0,beer_name,beer_id,brewery_name,brewery_id,style,abv,date,user_name,user_id,appearance,aroma,palate,taste,overall,rating,text,review
174889,Snow Hands,263683,Haw River Farmhouse Ales,35374,Belgian Strong Dark Ale,10.0,1488970800,MMOSNN,mmosnn.1145541,4.25,4.25,4.0,4.25,4.25,4.23,500ml bottle poured into a stemless wine glass...,True
172612,Ovni Ale,44140,Flat Earth Brewing Company,15642,Bière de Garde,7.1,1430215200,CraigP83,craigp83.910634,3.25,3.0,3.25,3.5,3.5,3.34,Brown but yet clear almost like tanic water in...,False
127058,Hard Wired Nitro Coffee Porter,186042,Left Hand Brewing Company,418,American Porter,6.0,1488711600,sloe_gin,sloe_gin.1193071,5.0,4.0,4.75,5.0,5.0,4.74,Looks like a nighttime drive through the count...,True
107878,F6 Brett Farmhouse Ale,135872,Transmitter Brewing,33902,Saison / Farmhouse Ale,7.5,1419850800,nmann08,nmann08.184925,4.0,3.75,3.75,3.75,3.75,3.77,"From a bottle, pours a bright gold yellow colo...",True
17201,Jhoom,48481,"Yuksom Breweries, Ltd.",314,American Adjunct Lager,4.5,1416567600,matjack85,matjack85.20005,,,,,,3.12,I found this 650ml brown bottle at Sal's Liquo...,True
232470,Wailua Wheat,36594,Kona Brewing Co.,579,American Pale Wheat Ale,5.4,1243072800,TheManiacalOne,themaniacalone.37950,3.5,3.5,4.0,4.0,4.0,3.85,On-tap 5/22/09 at Triple P Sports Bar in Pawtu...,True
231681,Longboard Island Lager,5328,Kona Brewing Co.,579,American Pale Lager,4.6,1291892400,SassyBootblack,sassybootblack.206072,3.0,3.0,3.0,3.5,3.0,3.2,On tap.Appearance: Clear yellow with a thin wh...,True
224247,Captain Sig's Northwestern Ale,51071,Rogue Ales,132,American Amber / Red Ale,6.2,1247824800,Brenden,brenden.198153,4.5,4.0,3.5,4.0,4.0,3.98,The color is a dark brown with deep coppery re...,True
60485,Wakatu Showers,176378,Noble Ale Works,22412,American Double / Imperial IPA,8.8,1435226400,2beerdogs,2beerdogs.14058,3.75,4.0,4.0,4.25,4.0,4.09,"On tap at Iron Press, Anaheim. Sweet tropical ...",True
187893,Pieces Of Eight,161485,Foley Brothers Brewing,30542,American Double / Imperial IPA,8.0,1453460400,BeerAndGasMasks,beerandgasmasks.673685,4.25,4.5,4.25,4.5,4.25,4.41,"Pours a golden amber with a small head, quickl...",True


In [45]:
df_B = pd.read_csv(os.path.join(MATCHED_PATH, 'ratings_rb_with_text.csv'))
df_B.head(10)

Unnamed: 0,beer_name,beer_id,brewery_name,brewery_id,style,abv,date,user_name,user_id,appearance,aroma,palate,taste,overall,rating,text
0,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1429178400,ciaranc,151109,3,7,2,8,17,3.7,"Bottle, gift from Aaron. Black, big head, lots..."
1,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1427796000,Rowlymo,198957,4,7,4,8,15,3.8,500ml Bottle in Bittles bar Belfast. Chocolate...
2,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1421665200,Don2711,285162,4,8,5,7,17,4.1,Great one and made very local to me. Bottle bo...
3,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1380621600,genegenie,224156,5,8,3,5,19,4.0,"Deep roasted aroma, good dark brown/black colo..."
4,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1371549600,kiwianer,88501,4,7,4,7,14,3.6,"The head is medium, the body black. It smells ..."
5,Ards Bally Black Stout,155699,Ards Brewing Co.,13538,Stout,4.6,1321527600,Beersiveknown,128086,5,7,5,7,16,4.0,Cask@ Belfast beer fest. Dark brown black with...
6,Ards Pig Island Pale Ale,160664,Ards Brewing Co.,13538,Bitter,5.2,1382436000,Beersiveknown,128086,4,7,4,5,13,3.3,"Bottle at Bittles Bar, BelfastHazy amber brown..."
7,Ards Pig Island Pale Ale,160664,Ards Brewing Co.,13538,Bitter,5.2,1371549600,kiwianer,88501,4,7,3,5,14,3.3,"The head is stable, the body golden, orange. I..."
8,Ards Pig Island Pale Ale,160664,Ards Brewing Co.,13538,Bitter,5.2,1385550000,genegenie,224156,5,8,4,9,18,4.4,Hard to find but worth it when you do. A refre...
9,Ards Pig Island Pale Ale,160664,Ards Brewing Co.,13538,Bitter,5.2,1339581600,visionthing,91324,4,7,4,7,14,3.6,"50cl bottle (4,8% ABV) at Bittles Bar, Belfast..."


In [47]:
print(len(df))
print(len(df_B))

1020638
1020599


**RATEBEER DATA SET**

In [39]:
ratings = 'ratings.txt'
reviews = 'reviews.txt'

In [40]:
parse_txt_file_to_csv(os.path.join(RATEBEER_PATH, ratings), os.path.join(RATEBEER_PATH, 'ratings.csv'), batch_size=500)

Data saved to /content/drive/My Drive/AdaDataSet/ratebeer/ratings.csv


In [3]:
df = pd.read_csv(os.path.join(RATEBEER_PATH, 'ratings.csv'))
df.sample(10)

Unnamed: 0,beer_name,beer_id,brewery_name,brewery_id,style,abv,date,user_name,user_id,appearance,aroma,palate,taste,overall,rating,text
1283670,Gruthaus Bockwurst-Bock,304257,Gruthaus-Brauerei,17658,Smoked,7.5,1455966000,King_Alex_II,343843,3,8,4,8,15,3.8,Orange brown color with foam ring. Intense woo...
1096678,Kauzen Original 1809,19298,Kauzen-Bräu,3179,Dortmunder/Helles,5.2,1375696800,Maria,19592,3,7,3,7,14,3.4,Itâs beautiful yellow golden with a white he...
131506,Dieu du Ciel Péché Mortel,11461,Dieu du Ciel,364,Imperial Stout,9.5,1221127200,nbutler11,73530,5,7,4,8,17,4.1,Pours like motor oil from the bottle with a tr...
508276,Birra del Borgo ReAle,49598,Birra del Borgo (AB InBev),6100,American Pale Ale,6.4,1495447200,Vighals,271988,4,6,3,6,13,3.2,Orangebrun tÃ¥kete med hÃ¸yt tykt eggegult sku...
6708174,Troubadour Magma Special Edition 2015 Triple S...,356203,Brouwerij The Musketeers,11515,Imperial IPA,9.8,1492768800,Keukeman,369853,2,6,3,5,12,2.8,".33l bottle with friends, amber beer with soap..."
548403,Brewfist Bionic,252370,Brewfist,12439,India Pale Ale (IPA),6.0,1416049200,madmitch76,81835,4,7,3,7,13,3.4,26th September 2014Borefts VI. Keg. Lightly ha...
5963108,Freedom Organic Lager,58913,Freedom,6811,Pale Lager,4.8,1143799200,Aarleks,12506,3,3,3,4,7,2.0,"Tap, Bunker, Covent Garden: With my London gui..."
4910807,Gritty McDuffs Original Pub Style,32237,Gritty McDuffs,337,American Pale Ale,4.5,1228647600,Tmoney99,11116,4,6,3,5,12,3.0,"Bottle from World of Beer, Clearwater, FL.Pour..."
5102402,New Holland Zoomer Wit,2016,New Holland Brewing Company,343,Witbier,4.7,1217844000,nuplastikk,58251,4,6,3,8,17,3.8,12oz bottle. Cloudy golden-orange color. Ligh...
2073744,Pyynikin SunRice IPA,388929,Pyynikin Käsityöläispanimo,14197,Imperial IPA,8.2,1458385200,Beerhunter111,227834,3,6,4,6,14,3.3,"33cl bottle from trade with rosenbergh, many t..."


In [4]:
print(len(df))

7122074


In [None]:
#Has been done directely on my computer for max size issue
#parse_txt_file_to_csv(os.path.join(RATEBEER_PATH, reviews), os.path.join(RATEBEER_PATH, 'reviews.csv'), batch_size=500)

In [5]:
df = pd.read_csv(os.path.join(RATEBEER_PATH, 'reviews.csv'))
df.sample(10)

Unnamed: 0,beer_name,beer_id,brewery_name,brewery_id,style,abv,date,user_name,user_id,appearance,aroma,palate,taste,overall,rating,text
1780407,Primátor Premium Dark (Tmavý Leák),17845,Pivovar Náchod (LIF),2349,Dunkel/Tmavý,4.8,1340704800,visionthing,91324,3,6,3,6,13,3.1,33cl bottle (as Primator Premium Dark) from Sy...
4732830,Otter Creek Pale Ale,2429,Otter Creek Brewing,417,American Pale Ale,4.6,1142852400,Fish,7707,3,7,3,6,15,3.4,This is a solid pale ale. Nothing spectacular...
5560203,Mustang Route 66 American Lager,197629,Mustang Brewing Company,10709,Premium Lager,5.0,1384686000,deyholla,86363,3,4,2,3,7,1.9,On tap. Pours a clear amber with an off-white ...
1075854,Wilde Rose Kellerbier,17670,Franken Bräu Lorenz Bauer,2035,Zwickel/Keller/Landbier,,1391252400,Theydon_Bois,208877,4,6,4,6,13,3.3,"Day 2 of the post Xmas Bamberg Jolly, Tap at W..."
5079899,Jolly Pumpkin Maracaibo Especial,43153,Jolly Pumpkin Artisan Ales,4923,Belgian Strong Ale,7.5,1108465200,unclebleen,8288,5,8,4,8,17,4.2,pours a big brown color into a goblet. huge l...
4450812,Perennial Peach Berliner Weisse,180854,Perennial Artisan Ales,13326,Berliner Weisse,4.1,1354878000,brokensail,96974,4,7,4,7,16,3.8,A hazy golden orange pour with a white head. V...
1305685,Uerige Doppel Sticke,46158,Uerige Obergärige Hausbrauerei,1332,Altbier,8.5,1473069600,Beerhammaren,156129,3,7,4,8,16,3.8,.33l bottle from Munich. Tnx Sampo. Rated at H...
6899682,Leffe Ruby,91345,InBev Belgium,260,Fruit Beer,5.0,1329908400,gunnfryd,31397,3,5,3,5,11,2.7,Bottle. Amber colour with a beige head. Aroma ...
565749,Ichnusa,8156,Heineken Italia,1395,Pale Lager,4.7,1439719200,esierra98,325795,1,3,2,5,12,2.3,Botella. Cerveza fuerte con un sabor presente....
3388706,Fremont Publican Double IPA,162695,Fremont Brewing Company,10514,Imperial IPA,8.0,1331204400,riversideAK,37954,4,6,3,6,14,3.3,"Earth, pine, dank, garlic and onion on top of ..."


In [6]:
print(len(df))


7122074
