In this notebook I am going to cover the processing of cleaning and formatting a specific dataset for EMADE.

In [1]:
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.linear_model import LogisticRegression

import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

import shutil
import gzip

import os
import re
import math

We will start by importing all the libraries we will need. The most important libraries are numpy and pandas because we will use their data types to store and manipulate our dataset.

Note: Make sure you have nltk's corpus downloaded. You can check by running nltk.download() in a Python shell.

Next, we will import our dataset into a pandas dataframe object.

In [2]:
data = pd.read_csv("datasets/winemag-data_first150k.csv")

# Print first 5 examples
print(data.head())
print(data.shape)

# remove all rows except for the first 25,000
data = data.drop(data.index[25000:])
print(data.shape)

  country                                        description  \
0      US  This tremendous 100% varietal wine hails from ...   
1   Spain  Ripe aromas of fig, blackberry and cassis are ...   
2      US  Mac Watson honors the memory of a wine once ma...   
3      US  This spent 20 months in 30% new French oak, an...   
4  France  This is the top wine from La Bégude, named aft...   

                            designation  points  price        province  \
0                     Martha's Vineyard      96  235.0      California   
1  Carodorum Selección Especial Reserva      96  110.0  Northern Spain   
2         Special Selected Late Harvest      96   90.0      California   
3                               Reserve      96   65.0          Oregon   
4                            La Brûlade      95   66.0        Provence   

            region_1           region_2             variety  \
0        Napa Valley               Napa  Cabernet Sauvignon   
1               Toro                NaN     

The original link to the dataset can be found here: https://www.kaggle.com/zynicide/wine-reviews

However, I removed the index column from the csv in Excel.

The dataset consists of 10 features. Most of these are string values, which machine learning algorithms and data transformations will not accept. The dataset also has columns with missing values. We will either have to remove those columns or replace the missing values.

We also cut the dataset size down to 25,000 examples to reduce the size of our final dataset and RAM usage.

Note: if your computer does not have at least 8 GB of RAM, you might need to reduce the size of the data even more

Next, we will separate a column for labels and modify its values.

In [3]:
labels = data[["price"]]
print(labels.shape)

# Fill in all missing values with mean of column
labels = labels.fillna(labels.mean())

# Set all values greater or equal to 50 to 1
labels.loc[labels.price >= 50, 'price'] = 1
labels.loc[labels.price != 1, 'price'] = 0

(25000, 1)


In this case, we will choose to classify whether a wine has a price greater than or equal to 50. It is important to learn how to recognize potential labels and classification problems when you find a dataset. Most datasets will not have pre-defined labels and even those with labels you can tweak into a different problem.

We also fill in the missing price values with the mean of all price values. There are many different methods of dealing with missing values. You could remove all missing value rows or replace missing values with the median instead.

Next, we will remove irrelevant and/or unuseful features.

In [4]:
data = data.drop("price", axis=1)
data = data.drop("region_2", axis=1)
data = data.drop("winery", axis=1)
data = data.drop("designation", axis=1)

I removed these features for a specific reason. Price needs to be removed because we will add it back as labels later. Region 2 is similar enough to region_1 to not add useful information. Winery and designation have too many unique values.

When we one-hot encode some of our features in the next section, too many unique values will matter. If most of the strings in a feature are unique, we get very little useful or relevant information and only make the dimensions of our data unnecessarily large. This also applies to region 2. We do not want to double our dimensions with overlapping information.

Next, we will one-hot encoding on our remaining string features to extract relevant information from them. 

In [5]:
data = pd.get_dummies(data, columns=["country", "province", "variety", "region_1"])

An explanation of one-hot encoding can be found here: https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science

Next, we will separate the descriptions column and clean up the text data. One-hot encoding does not work well on columns with unique sentences or paragraphs because all of the items in the column will end up being unique.

In [6]:
text = data[["description"]].values
data = data.drop("description", axis=1)

text_list = []
for i in text:
    text_list.append(i[0])
    
# Alternative
# text = text.tolist()
# text = [i[0] for i in text]
    
# For debugging
print(text_list[0])
count = 0

stemmer = PorterStemmer()

for text in text_list:
    count += 1
    text = text.lower()
    text = re.sub('[!@#$.,?]', '', text)
    words = text.split(" ")
    words = [word for word in words if word not in stopwords.words('english')]
    words = [stemmer.stem(word) for word in words]
    text = " ".join(words)
    print("line " + str(count) + " processed")

This tremendous 100% varietal wine hails from Oakville and was aged over three years in oak. Juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. Balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. Enjoy 2022–2030.
line 1 processed
line 2 processed
line 3 processed
line 4 processed
line 5 processed
line 6 processed
line 7 processed
line 8 processed
line 9 processed
line 10 processed
line 11 processed
line 12 processed
line 13 processed
line 14 processed
line 15 processed
line 16 processed
line 17 processed
line 18 processed
line 19 processed
line 20 processed
line 21 processed
line 22 processed
line 23 processed
line 24 processed
line 25 processed
line 26 processed
line 27 processed
line 28 processed
line 29 processed
line 30 processed
line 31 processed
line 32 processed
line 33 processed
line 34 processed
line 35 processed
line 36 processed
line 

line 439 processed
line 440 processed
line 441 processed
line 442 processed
line 443 processed
line 444 processed
line 445 processed
line 446 processed
line 447 processed
line 448 processed
line 449 processed
line 450 processed
line 451 processed
line 452 processed
line 453 processed
line 454 processed
line 455 processed
line 456 processed
line 457 processed
line 458 processed
line 459 processed
line 460 processed
line 461 processed
line 462 processed
line 463 processed
line 464 processed
line 465 processed
line 466 processed
line 467 processed
line 468 processed
line 469 processed
line 470 processed
line 471 processed
line 472 processed
line 473 processed
line 474 processed
line 475 processed
line 476 processed
line 477 processed
line 478 processed
line 479 processed
line 480 processed
line 481 processed
line 482 processed
line 483 processed
line 484 processed
line 485 processed
line 486 processed
line 487 processed
line 488 processed
line 489 processed
line 490 processed
line 491 pro

line 875 processed
line 876 processed
line 877 processed
line 878 processed
line 879 processed
line 880 processed
line 881 processed
line 882 processed
line 883 processed
line 884 processed
line 885 processed
line 886 processed
line 887 processed
line 888 processed
line 889 processed
line 890 processed
line 891 processed
line 892 processed
line 893 processed
line 894 processed
line 895 processed
line 896 processed
line 897 processed
line 898 processed
line 899 processed
line 900 processed
line 901 processed
line 902 processed
line 903 processed
line 904 processed
line 905 processed
line 906 processed
line 907 processed
line 908 processed
line 909 processed
line 910 processed
line 911 processed
line 912 processed
line 913 processed
line 914 processed
line 915 processed
line 916 processed
line 917 processed
line 918 processed
line 919 processed
line 920 processed
line 921 processed
line 922 processed
line 923 processed
line 924 processed
line 925 processed
line 926 processed
line 927 pro

line 1297 processed
line 1298 processed
line 1299 processed
line 1300 processed
line 1301 processed
line 1302 processed
line 1303 processed
line 1304 processed
line 1305 processed
line 1306 processed
line 1307 processed
line 1308 processed
line 1309 processed
line 1310 processed
line 1311 processed
line 1312 processed
line 1313 processed
line 1314 processed
line 1315 processed
line 1316 processed
line 1317 processed
line 1318 processed
line 1319 processed
line 1320 processed
line 1321 processed
line 1322 processed
line 1323 processed
line 1324 processed
line 1325 processed
line 1326 processed
line 1327 processed
line 1328 processed
line 1329 processed
line 1330 processed
line 1331 processed
line 1332 processed
line 1333 processed
line 1334 processed
line 1335 processed
line 1336 processed
line 1337 processed
line 1338 processed
line 1339 processed
line 1340 processed
line 1341 processed
line 1342 processed
line 1343 processed
line 1344 processed
line 1345 processed
line 1346 processed


line 1721 processed
line 1722 processed
line 1723 processed
line 1724 processed
line 1725 processed
line 1726 processed
line 1727 processed
line 1728 processed
line 1729 processed
line 1730 processed
line 1731 processed
line 1732 processed
line 1733 processed
line 1734 processed
line 1735 processed
line 1736 processed
line 1737 processed
line 1738 processed
line 1739 processed
line 1740 processed
line 1741 processed
line 1742 processed
line 1743 processed
line 1744 processed
line 1745 processed
line 1746 processed
line 1747 processed
line 1748 processed
line 1749 processed
line 1750 processed
line 1751 processed
line 1752 processed
line 1753 processed
line 1754 processed
line 1755 processed
line 1756 processed
line 1757 processed
line 1758 processed
line 1759 processed
line 1760 processed
line 1761 processed
line 1762 processed
line 1763 processed
line 1764 processed
line 1765 processed
line 1766 processed
line 1767 processed
line 1768 processed
line 1769 processed
line 1770 processed


line 2157 processed
line 2158 processed
line 2159 processed
line 2160 processed
line 2161 processed
line 2162 processed
line 2163 processed
line 2164 processed
line 2165 processed
line 2166 processed
line 2167 processed
line 2168 processed
line 2169 processed
line 2170 processed
line 2171 processed
line 2172 processed
line 2173 processed
line 2174 processed
line 2175 processed
line 2176 processed
line 2177 processed
line 2178 processed
line 2179 processed
line 2180 processed
line 2181 processed
line 2182 processed
line 2183 processed
line 2184 processed
line 2185 processed
line 2186 processed
line 2187 processed
line 2188 processed
line 2189 processed
line 2190 processed
line 2191 processed
line 2192 processed
line 2193 processed
line 2194 processed
line 2195 processed
line 2196 processed
line 2197 processed
line 2198 processed
line 2199 processed
line 2200 processed
line 2201 processed
line 2202 processed
line 2203 processed
line 2204 processed
line 2205 processed
line 2206 processed


line 2575 processed
line 2576 processed
line 2577 processed
line 2578 processed
line 2579 processed
line 2580 processed
line 2581 processed
line 2582 processed
line 2583 processed
line 2584 processed
line 2585 processed
line 2586 processed
line 2587 processed
line 2588 processed
line 2589 processed
line 2590 processed
line 2591 processed
line 2592 processed
line 2593 processed
line 2594 processed
line 2595 processed
line 2596 processed
line 2597 processed
line 2598 processed
line 2599 processed
line 2600 processed
line 2601 processed
line 2602 processed
line 2603 processed
line 2604 processed
line 2605 processed
line 2606 processed
line 2607 processed
line 2608 processed
line 2609 processed
line 2610 processed
line 2611 processed
line 2612 processed
line 2613 processed
line 2614 processed
line 2615 processed
line 2616 processed
line 2617 processed
line 2618 processed
line 2619 processed
line 2620 processed
line 2621 processed
line 2622 processed
line 2623 processed
line 2624 processed


line 3016 processed
line 3017 processed
line 3018 processed
line 3019 processed
line 3020 processed
line 3021 processed
line 3022 processed
line 3023 processed
line 3024 processed
line 3025 processed
line 3026 processed
line 3027 processed
line 3028 processed
line 3029 processed
line 3030 processed
line 3031 processed
line 3032 processed
line 3033 processed
line 3034 processed
line 3035 processed
line 3036 processed
line 3037 processed
line 3038 processed
line 3039 processed
line 3040 processed
line 3041 processed
line 3042 processed
line 3043 processed
line 3044 processed
line 3045 processed
line 3046 processed
line 3047 processed
line 3048 processed
line 3049 processed
line 3050 processed
line 3051 processed
line 3052 processed
line 3053 processed
line 3054 processed
line 3055 processed
line 3056 processed
line 3057 processed
line 3058 processed
line 3059 processed
line 3060 processed
line 3061 processed
line 3062 processed
line 3063 processed
line 3064 processed
line 3065 processed


line 3450 processed
line 3451 processed
line 3452 processed
line 3453 processed
line 3454 processed
line 3455 processed
line 3456 processed
line 3457 processed
line 3458 processed
line 3459 processed
line 3460 processed
line 3461 processed
line 3462 processed
line 3463 processed
line 3464 processed
line 3465 processed
line 3466 processed
line 3467 processed
line 3468 processed
line 3469 processed
line 3470 processed
line 3471 processed
line 3472 processed
line 3473 processed
line 3474 processed
line 3475 processed
line 3476 processed
line 3477 processed
line 3478 processed
line 3479 processed
line 3480 processed
line 3481 processed
line 3482 processed
line 3483 processed
line 3484 processed
line 3485 processed
line 3486 processed
line 3487 processed
line 3488 processed
line 3489 processed
line 3490 processed
line 3491 processed
line 3492 processed
line 3493 processed
line 3494 processed
line 3495 processed
line 3496 processed
line 3497 processed
line 3498 processed
line 3499 processed


line 3868 processed
line 3869 processed
line 3870 processed
line 3871 processed
line 3872 processed
line 3873 processed
line 3874 processed
line 3875 processed
line 3876 processed
line 3877 processed
line 3878 processed
line 3879 processed
line 3880 processed
line 3881 processed
line 3882 processed
line 3883 processed
line 3884 processed
line 3885 processed
line 3886 processed
line 3887 processed
line 3888 processed
line 3889 processed
line 3890 processed
line 3891 processed
line 3892 processed
line 3893 processed
line 3894 processed
line 3895 processed
line 3896 processed
line 3897 processed
line 3898 processed
line 3899 processed
line 3900 processed
line 3901 processed
line 3902 processed
line 3903 processed
line 3904 processed
line 3905 processed
line 3906 processed
line 3907 processed
line 3908 processed
line 3909 processed
line 3910 processed
line 3911 processed
line 3912 processed
line 3913 processed
line 3914 processed
line 3915 processed
line 3916 processed
line 3917 processed


line 4281 processed
line 4282 processed
line 4283 processed
line 4284 processed
line 4285 processed
line 4286 processed
line 4287 processed
line 4288 processed
line 4289 processed
line 4290 processed
line 4291 processed
line 4292 processed
line 4293 processed
line 4294 processed
line 4295 processed
line 4296 processed
line 4297 processed
line 4298 processed
line 4299 processed
line 4300 processed
line 4301 processed
line 4302 processed
line 4303 processed
line 4304 processed
line 4305 processed
line 4306 processed
line 4307 processed
line 4308 processed
line 4309 processed
line 4310 processed
line 4311 processed
line 4312 processed
line 4313 processed
line 4314 processed
line 4315 processed
line 4316 processed
line 4317 processed
line 4318 processed
line 4319 processed
line 4320 processed
line 4321 processed
line 4322 processed
line 4323 processed
line 4324 processed
line 4325 processed
line 4326 processed
line 4327 processed
line 4328 processed
line 4329 processed
line 4330 processed


line 4714 processed
line 4715 processed
line 4716 processed
line 4717 processed
line 4718 processed
line 4719 processed
line 4720 processed
line 4721 processed
line 4722 processed
line 4723 processed
line 4724 processed
line 4725 processed
line 4726 processed
line 4727 processed
line 4728 processed
line 4729 processed
line 4730 processed
line 4731 processed
line 4732 processed
line 4733 processed
line 4734 processed
line 4735 processed
line 4736 processed
line 4737 processed
line 4738 processed
line 4739 processed
line 4740 processed
line 4741 processed
line 4742 processed
line 4743 processed
line 4744 processed
line 4745 processed
line 4746 processed
line 4747 processed
line 4748 processed
line 4749 processed
line 4750 processed
line 4751 processed
line 4752 processed
line 4753 processed
line 4754 processed
line 4755 processed
line 4756 processed
line 4757 processed
line 4758 processed
line 4759 processed
line 4760 processed
line 4761 processed
line 4762 processed
line 4763 processed


line 5142 processed
line 5143 processed
line 5144 processed
line 5145 processed
line 5146 processed
line 5147 processed
line 5148 processed
line 5149 processed
line 5150 processed
line 5151 processed
line 5152 processed
line 5153 processed
line 5154 processed
line 5155 processed
line 5156 processed
line 5157 processed
line 5158 processed
line 5159 processed
line 5160 processed
line 5161 processed
line 5162 processed
line 5163 processed
line 5164 processed
line 5165 processed
line 5166 processed
line 5167 processed
line 5168 processed
line 5169 processed
line 5170 processed
line 5171 processed
line 5172 processed
line 5173 processed
line 5174 processed
line 5175 processed
line 5176 processed
line 5177 processed
line 5178 processed
line 5179 processed
line 5180 processed
line 5181 processed
line 5182 processed
line 5183 processed
line 5184 processed
line 5185 processed
line 5186 processed
line 5187 processed
line 5188 processed
line 5189 processed
line 5190 processed
line 5191 processed


line 5582 processed
line 5583 processed
line 5584 processed
line 5585 processed
line 5586 processed
line 5587 processed
line 5588 processed
line 5589 processed
line 5590 processed
line 5591 processed
line 5592 processed
line 5593 processed
line 5594 processed
line 5595 processed
line 5596 processed
line 5597 processed
line 5598 processed
line 5599 processed
line 5600 processed
line 5601 processed
line 5602 processed
line 5603 processed
line 5604 processed
line 5605 processed
line 5606 processed
line 5607 processed
line 5608 processed
line 5609 processed
line 5610 processed
line 5611 processed
line 5612 processed
line 5613 processed
line 5614 processed
line 5615 processed
line 5616 processed
line 5617 processed
line 5618 processed
line 5619 processed
line 5620 processed
line 5621 processed
line 5622 processed
line 5623 processed
line 5624 processed
line 5625 processed
line 5626 processed
line 5627 processed
line 5628 processed
line 5629 processed
line 5630 processed
line 5631 processed


line 5993 processed
line 5994 processed
line 5995 processed
line 5996 processed
line 5997 processed
line 5998 processed
line 5999 processed
line 6000 processed
line 6001 processed
line 6002 processed
line 6003 processed
line 6004 processed
line 6005 processed
line 6006 processed
line 6007 processed
line 6008 processed
line 6009 processed
line 6010 processed
line 6011 processed
line 6012 processed
line 6013 processed
line 6014 processed
line 6015 processed
line 6016 processed
line 6017 processed
line 6018 processed
line 6019 processed
line 6020 processed
line 6021 processed
line 6022 processed
line 6023 processed
line 6024 processed
line 6025 processed
line 6026 processed
line 6027 processed
line 6028 processed
line 6029 processed
line 6030 processed
line 6031 processed
line 6032 processed
line 6033 processed
line 6034 processed
line 6035 processed
line 6036 processed
line 6037 processed
line 6038 processed
line 6039 processed
line 6040 processed
line 6041 processed
line 6042 processed


line 6431 processed
line 6432 processed
line 6433 processed
line 6434 processed
line 6435 processed
line 6436 processed
line 6437 processed
line 6438 processed
line 6439 processed
line 6440 processed
line 6441 processed
line 6442 processed
line 6443 processed
line 6444 processed
line 6445 processed
line 6446 processed
line 6447 processed
line 6448 processed
line 6449 processed
line 6450 processed
line 6451 processed
line 6452 processed
line 6453 processed
line 6454 processed
line 6455 processed
line 6456 processed
line 6457 processed
line 6458 processed
line 6459 processed
line 6460 processed
line 6461 processed
line 6462 processed
line 6463 processed
line 6464 processed
line 6465 processed
line 6466 processed
line 6467 processed
line 6468 processed
line 6469 processed
line 6470 processed
line 6471 processed
line 6472 processed
line 6473 processed
line 6474 processed
line 6475 processed
line 6476 processed
line 6477 processed
line 6478 processed
line 6479 processed
line 6480 processed


line 6842 processed
line 6843 processed
line 6844 processed
line 6845 processed
line 6846 processed
line 6847 processed
line 6848 processed
line 6849 processed
line 6850 processed
line 6851 processed
line 6852 processed
line 6853 processed
line 6854 processed
line 6855 processed
line 6856 processed
line 6857 processed
line 6858 processed
line 6859 processed
line 6860 processed
line 6861 processed
line 6862 processed
line 6863 processed
line 6864 processed
line 6865 processed
line 6866 processed
line 6867 processed
line 6868 processed
line 6869 processed
line 6870 processed
line 6871 processed
line 6872 processed
line 6873 processed
line 6874 processed
line 6875 processed
line 6876 processed
line 6877 processed
line 6878 processed
line 6879 processed
line 6880 processed
line 6881 processed
line 6882 processed
line 6883 processed
line 6884 processed
line 6885 processed
line 6886 processed
line 6887 processed
line 6888 processed
line 6889 processed
line 6890 processed
line 6891 processed


line 7254 processed
line 7255 processed
line 7256 processed
line 7257 processed
line 7258 processed
line 7259 processed
line 7260 processed
line 7261 processed
line 7262 processed
line 7263 processed
line 7264 processed
line 7265 processed
line 7266 processed
line 7267 processed
line 7268 processed
line 7269 processed
line 7270 processed
line 7271 processed
line 7272 processed
line 7273 processed
line 7274 processed
line 7275 processed
line 7276 processed
line 7277 processed
line 7278 processed
line 7279 processed
line 7280 processed
line 7281 processed
line 7282 processed
line 7283 processed
line 7284 processed
line 7285 processed
line 7286 processed
line 7287 processed
line 7288 processed
line 7289 processed
line 7290 processed
line 7291 processed
line 7292 processed
line 7293 processed
line 7294 processed
line 7295 processed
line 7296 processed
line 7297 processed
line 7298 processed
line 7299 processed
line 7300 processed
line 7301 processed
line 7302 processed
line 7303 processed


line 7685 processed
line 7686 processed
line 7687 processed
line 7688 processed
line 7689 processed
line 7690 processed
line 7691 processed
line 7692 processed
line 7693 processed
line 7694 processed
line 7695 processed
line 7696 processed
line 7697 processed
line 7698 processed
line 7699 processed
line 7700 processed
line 7701 processed
line 7702 processed
line 7703 processed
line 7704 processed
line 7705 processed
line 7706 processed
line 7707 processed
line 7708 processed
line 7709 processed
line 7710 processed
line 7711 processed
line 7712 processed
line 7713 processed
line 7714 processed
line 7715 processed
line 7716 processed
line 7717 processed
line 7718 processed
line 7719 processed
line 7720 processed
line 7721 processed
line 7722 processed
line 7723 processed
line 7724 processed
line 7725 processed
line 7726 processed
line 7727 processed
line 7728 processed
line 7729 processed
line 7730 processed
line 7731 processed
line 7732 processed
line 7733 processed
line 7734 processed


line 8107 processed
line 8108 processed
line 8109 processed
line 8110 processed
line 8111 processed
line 8112 processed
line 8113 processed
line 8114 processed
line 8115 processed
line 8116 processed
line 8117 processed
line 8118 processed
line 8119 processed
line 8120 processed
line 8121 processed
line 8122 processed
line 8123 processed
line 8124 processed
line 8125 processed
line 8126 processed
line 8127 processed
line 8128 processed
line 8129 processed
line 8130 processed
line 8131 processed
line 8132 processed
line 8133 processed
line 8134 processed
line 8135 processed
line 8136 processed
line 8137 processed
line 8138 processed
line 8139 processed
line 8140 processed
line 8141 processed
line 8142 processed
line 8143 processed
line 8144 processed
line 8145 processed
line 8146 processed
line 8147 processed
line 8148 processed
line 8149 processed
line 8150 processed
line 8151 processed
line 8152 processed
line 8153 processed
line 8154 processed
line 8155 processed
line 8156 processed


line 8528 processed
line 8529 processed
line 8530 processed
line 8531 processed
line 8532 processed
line 8533 processed
line 8534 processed
line 8535 processed
line 8536 processed
line 8537 processed
line 8538 processed
line 8539 processed
line 8540 processed
line 8541 processed
line 8542 processed
line 8543 processed
line 8544 processed
line 8545 processed
line 8546 processed
line 8547 processed
line 8548 processed
line 8549 processed
line 8550 processed
line 8551 processed
line 8552 processed
line 8553 processed
line 8554 processed
line 8555 processed
line 8556 processed
line 8557 processed
line 8558 processed
line 8559 processed
line 8560 processed
line 8561 processed
line 8562 processed
line 8563 processed
line 8564 processed
line 8565 processed
line 8566 processed
line 8567 processed
line 8568 processed
line 8569 processed
line 8570 processed
line 8571 processed
line 8572 processed
line 8573 processed
line 8574 processed
line 8575 processed
line 8576 processed
line 8577 processed


line 8945 processed
line 8946 processed
line 8947 processed
line 8948 processed
line 8949 processed
line 8950 processed
line 8951 processed
line 8952 processed
line 8953 processed
line 8954 processed
line 8955 processed
line 8956 processed
line 8957 processed
line 8958 processed
line 8959 processed
line 8960 processed
line 8961 processed
line 8962 processed
line 8963 processed
line 8964 processed
line 8965 processed
line 8966 processed
line 8967 processed
line 8968 processed
line 8969 processed
line 8970 processed
line 8971 processed
line 8972 processed
line 8973 processed
line 8974 processed
line 8975 processed
line 8976 processed
line 8977 processed
line 8978 processed
line 8979 processed
line 8980 processed
line 8981 processed
line 8982 processed
line 8983 processed
line 8984 processed
line 8985 processed
line 8986 processed
line 8987 processed
line 8988 processed
line 8989 processed
line 8990 processed
line 8991 processed
line 8992 processed
line 8993 processed
line 8994 processed


line 9364 processed
line 9365 processed
line 9366 processed
line 9367 processed
line 9368 processed
line 9369 processed
line 9370 processed
line 9371 processed
line 9372 processed
line 9373 processed
line 9374 processed
line 9375 processed
line 9376 processed
line 9377 processed
line 9378 processed
line 9379 processed
line 9380 processed
line 9381 processed
line 9382 processed
line 9383 processed
line 9384 processed
line 9385 processed
line 9386 processed
line 9387 processed
line 9388 processed
line 9389 processed
line 9390 processed
line 9391 processed
line 9392 processed
line 9393 processed
line 9394 processed
line 9395 processed
line 9396 processed
line 9397 processed
line 9398 processed
line 9399 processed
line 9400 processed
line 9401 processed
line 9402 processed
line 9403 processed
line 9404 processed
line 9405 processed
line 9406 processed
line 9407 processed
line 9408 processed
line 9409 processed
line 9410 processed
line 9411 processed
line 9412 processed
line 9413 processed


line 9797 processed
line 9798 processed
line 9799 processed
line 9800 processed
line 9801 processed
line 9802 processed
line 9803 processed
line 9804 processed
line 9805 processed
line 9806 processed
line 9807 processed
line 9808 processed
line 9809 processed
line 9810 processed
line 9811 processed
line 9812 processed
line 9813 processed
line 9814 processed
line 9815 processed
line 9816 processed
line 9817 processed
line 9818 processed
line 9819 processed
line 9820 processed
line 9821 processed
line 9822 processed
line 9823 processed
line 9824 processed
line 9825 processed
line 9826 processed
line 9827 processed
line 9828 processed
line 9829 processed
line 9830 processed
line 9831 processed
line 9832 processed
line 9833 processed
line 9834 processed
line 9835 processed
line 9836 processed
line 9837 processed
line 9838 processed
line 9839 processed
line 9840 processed
line 9841 processed
line 9842 processed
line 9843 processed
line 9844 processed
line 9845 processed
line 9846 processed


line 10207 processed
line 10208 processed
line 10209 processed
line 10210 processed
line 10211 processed
line 10212 processed
line 10213 processed
line 10214 processed
line 10215 processed
line 10216 processed
line 10217 processed
line 10218 processed
line 10219 processed
line 10220 processed
line 10221 processed
line 10222 processed
line 10223 processed
line 10224 processed
line 10225 processed
line 10226 processed
line 10227 processed
line 10228 processed
line 10229 processed
line 10230 processed
line 10231 processed
line 10232 processed
line 10233 processed
line 10234 processed
line 10235 processed
line 10236 processed
line 10237 processed
line 10238 processed
line 10239 processed
line 10240 processed
line 10241 processed
line 10242 processed
line 10243 processed
line 10244 processed
line 10245 processed
line 10246 processed
line 10247 processed
line 10248 processed
line 10249 processed
line 10250 processed
line 10251 processed
line 10252 processed
line 10253 processed
line 10254 pr

line 10608 processed
line 10609 processed
line 10610 processed
line 10611 processed
line 10612 processed
line 10613 processed
line 10614 processed
line 10615 processed
line 10616 processed
line 10617 processed
line 10618 processed
line 10619 processed
line 10620 processed
line 10621 processed
line 10622 processed
line 10623 processed
line 10624 processed
line 10625 processed
line 10626 processed
line 10627 processed
line 10628 processed
line 10629 processed
line 10630 processed
line 10631 processed
line 10632 processed
line 10633 processed
line 10634 processed
line 10635 processed
line 10636 processed
line 10637 processed
line 10638 processed
line 10639 processed
line 10640 processed
line 10641 processed
line 10642 processed
line 10643 processed
line 10644 processed
line 10645 processed
line 10646 processed
line 10647 processed
line 10648 processed
line 10649 processed
line 10650 processed
line 10651 processed
line 10652 processed
line 10653 processed
line 10654 processed
line 10655 pr

line 11012 processed
line 11013 processed
line 11014 processed
line 11015 processed
line 11016 processed
line 11017 processed
line 11018 processed
line 11019 processed
line 11020 processed
line 11021 processed
line 11022 processed
line 11023 processed
line 11024 processed
line 11025 processed
line 11026 processed
line 11027 processed
line 11028 processed
line 11029 processed
line 11030 processed
line 11031 processed
line 11032 processed
line 11033 processed
line 11034 processed
line 11035 processed
line 11036 processed
line 11037 processed
line 11038 processed
line 11039 processed
line 11040 processed
line 11041 processed
line 11042 processed
line 11043 processed
line 11044 processed
line 11045 processed
line 11046 processed
line 11047 processed
line 11048 processed
line 11049 processed
line 11050 processed
line 11051 processed
line 11052 processed
line 11053 processed
line 11054 processed
line 11055 processed
line 11056 processed
line 11057 processed
line 11058 processed
line 11059 pr

line 11403 processed
line 11404 processed
line 11405 processed
line 11406 processed
line 11407 processed
line 11408 processed
line 11409 processed
line 11410 processed
line 11411 processed
line 11412 processed
line 11413 processed
line 11414 processed
line 11415 processed
line 11416 processed
line 11417 processed
line 11418 processed
line 11419 processed
line 11420 processed
line 11421 processed
line 11422 processed
line 11423 processed
line 11424 processed
line 11425 processed
line 11426 processed
line 11427 processed
line 11428 processed
line 11429 processed
line 11430 processed
line 11431 processed
line 11432 processed
line 11433 processed
line 11434 processed
line 11435 processed
line 11436 processed
line 11437 processed
line 11438 processed
line 11439 processed
line 11440 processed
line 11441 processed
line 11442 processed
line 11443 processed
line 11444 processed
line 11445 processed
line 11446 processed
line 11447 processed
line 11448 processed
line 11449 processed
line 11450 pr

line 11806 processed
line 11807 processed
line 11808 processed
line 11809 processed
line 11810 processed
line 11811 processed
line 11812 processed
line 11813 processed
line 11814 processed
line 11815 processed
line 11816 processed
line 11817 processed
line 11818 processed
line 11819 processed
line 11820 processed
line 11821 processed
line 11822 processed
line 11823 processed
line 11824 processed
line 11825 processed
line 11826 processed
line 11827 processed
line 11828 processed
line 11829 processed
line 11830 processed
line 11831 processed
line 11832 processed
line 11833 processed
line 11834 processed
line 11835 processed
line 11836 processed
line 11837 processed
line 11838 processed
line 11839 processed
line 11840 processed
line 11841 processed
line 11842 processed
line 11843 processed
line 11844 processed
line 11845 processed
line 11846 processed
line 11847 processed
line 11848 processed
line 11849 processed
line 11850 processed
line 11851 processed
line 11852 processed
line 11853 pr

line 12199 processed
line 12200 processed
line 12201 processed
line 12202 processed
line 12203 processed
line 12204 processed
line 12205 processed
line 12206 processed
line 12207 processed
line 12208 processed
line 12209 processed
line 12210 processed
line 12211 processed
line 12212 processed
line 12213 processed
line 12214 processed
line 12215 processed
line 12216 processed
line 12217 processed
line 12218 processed
line 12219 processed
line 12220 processed
line 12221 processed
line 12222 processed
line 12223 processed
line 12224 processed
line 12225 processed
line 12226 processed
line 12227 processed
line 12228 processed
line 12229 processed
line 12230 processed
line 12231 processed
line 12232 processed
line 12233 processed
line 12234 processed
line 12235 processed
line 12236 processed
line 12237 processed
line 12238 processed
line 12239 processed
line 12240 processed
line 12241 processed
line 12242 processed
line 12243 processed
line 12244 processed
line 12245 processed
line 12246 pr

line 12599 processed
line 12600 processed
line 12601 processed
line 12602 processed
line 12603 processed
line 12604 processed
line 12605 processed
line 12606 processed
line 12607 processed
line 12608 processed
line 12609 processed
line 12610 processed
line 12611 processed
line 12612 processed
line 12613 processed
line 12614 processed
line 12615 processed
line 12616 processed
line 12617 processed
line 12618 processed
line 12619 processed
line 12620 processed
line 12621 processed
line 12622 processed
line 12623 processed
line 12624 processed
line 12625 processed
line 12626 processed
line 12627 processed
line 12628 processed
line 12629 processed
line 12630 processed
line 12631 processed
line 12632 processed
line 12633 processed
line 12634 processed
line 12635 processed
line 12636 processed
line 12637 processed
line 12638 processed
line 12639 processed
line 12640 processed
line 12641 processed
line 12642 processed
line 12643 processed
line 12644 processed
line 12645 processed
line 12646 pr

line 12997 processed
line 12998 processed
line 12999 processed
line 13000 processed
line 13001 processed
line 13002 processed
line 13003 processed
line 13004 processed
line 13005 processed
line 13006 processed
line 13007 processed
line 13008 processed
line 13009 processed
line 13010 processed
line 13011 processed
line 13012 processed
line 13013 processed
line 13014 processed
line 13015 processed
line 13016 processed
line 13017 processed
line 13018 processed
line 13019 processed
line 13020 processed
line 13021 processed
line 13022 processed
line 13023 processed
line 13024 processed
line 13025 processed
line 13026 processed
line 13027 processed
line 13028 processed
line 13029 processed
line 13030 processed
line 13031 processed
line 13032 processed
line 13033 processed
line 13034 processed
line 13035 processed
line 13036 processed
line 13037 processed
line 13038 processed
line 13039 processed
line 13040 processed
line 13041 processed
line 13042 processed
line 13043 processed
line 13044 pr

line 13388 processed
line 13389 processed
line 13390 processed
line 13391 processed
line 13392 processed
line 13393 processed
line 13394 processed
line 13395 processed
line 13396 processed
line 13397 processed
line 13398 processed
line 13399 processed
line 13400 processed
line 13401 processed
line 13402 processed
line 13403 processed
line 13404 processed
line 13405 processed
line 13406 processed
line 13407 processed
line 13408 processed
line 13409 processed
line 13410 processed
line 13411 processed
line 13412 processed
line 13413 processed
line 13414 processed
line 13415 processed
line 13416 processed
line 13417 processed
line 13418 processed
line 13419 processed
line 13420 processed
line 13421 processed
line 13422 processed
line 13423 processed
line 13424 processed
line 13425 processed
line 13426 processed
line 13427 processed
line 13428 processed
line 13429 processed
line 13430 processed
line 13431 processed
line 13432 processed
line 13433 processed
line 13434 processed
line 13435 pr

line 13784 processed
line 13785 processed
line 13786 processed
line 13787 processed
line 13788 processed
line 13789 processed
line 13790 processed
line 13791 processed
line 13792 processed
line 13793 processed
line 13794 processed
line 13795 processed
line 13796 processed
line 13797 processed
line 13798 processed
line 13799 processed
line 13800 processed
line 13801 processed
line 13802 processed
line 13803 processed
line 13804 processed
line 13805 processed
line 13806 processed
line 13807 processed
line 13808 processed
line 13809 processed
line 13810 processed
line 13811 processed
line 13812 processed
line 13813 processed
line 13814 processed
line 13815 processed
line 13816 processed
line 13817 processed
line 13818 processed
line 13819 processed
line 13820 processed
line 13821 processed
line 13822 processed
line 13823 processed
line 13824 processed
line 13825 processed
line 13826 processed
line 13827 processed
line 13828 processed
line 13829 processed
line 13830 processed
line 13831 pr

line 14177 processed
line 14178 processed
line 14179 processed
line 14180 processed
line 14181 processed
line 14182 processed
line 14183 processed
line 14184 processed
line 14185 processed
line 14186 processed
line 14187 processed
line 14188 processed
line 14189 processed
line 14190 processed
line 14191 processed
line 14192 processed
line 14193 processed
line 14194 processed
line 14195 processed
line 14196 processed
line 14197 processed
line 14198 processed
line 14199 processed
line 14200 processed
line 14201 processed
line 14202 processed
line 14203 processed
line 14204 processed
line 14205 processed
line 14206 processed
line 14207 processed
line 14208 processed
line 14209 processed
line 14210 processed
line 14211 processed
line 14212 processed
line 14213 processed
line 14214 processed
line 14215 processed
line 14216 processed
line 14217 processed
line 14218 processed
line 14219 processed
line 14220 processed
line 14221 processed
line 14222 processed
line 14223 processed
line 14224 pr

line 14573 processed
line 14574 processed
line 14575 processed
line 14576 processed
line 14577 processed
line 14578 processed
line 14579 processed
line 14580 processed
line 14581 processed
line 14582 processed
line 14583 processed
line 14584 processed
line 14585 processed
line 14586 processed
line 14587 processed
line 14588 processed
line 14589 processed
line 14590 processed
line 14591 processed
line 14592 processed
line 14593 processed
line 14594 processed
line 14595 processed
line 14596 processed
line 14597 processed
line 14598 processed
line 14599 processed
line 14600 processed
line 14601 processed
line 14602 processed
line 14603 processed
line 14604 processed
line 14605 processed
line 14606 processed
line 14607 processed
line 14608 processed
line 14609 processed
line 14610 processed
line 14611 processed
line 14612 processed
line 14613 processed
line 14614 processed
line 14615 processed
line 14616 processed
line 14617 processed
line 14618 processed
line 14619 processed
line 14620 pr

line 14974 processed
line 14975 processed
line 14976 processed
line 14977 processed
line 14978 processed
line 14979 processed
line 14980 processed
line 14981 processed
line 14982 processed
line 14983 processed
line 14984 processed
line 14985 processed
line 14986 processed
line 14987 processed
line 14988 processed
line 14989 processed
line 14990 processed
line 14991 processed
line 14992 processed
line 14993 processed
line 14994 processed
line 14995 processed
line 14996 processed
line 14997 processed
line 14998 processed
line 14999 processed
line 15000 processed
line 15001 processed
line 15002 processed
line 15003 processed
line 15004 processed
line 15005 processed
line 15006 processed
line 15007 processed
line 15008 processed
line 15009 processed
line 15010 processed
line 15011 processed
line 15012 processed
line 15013 processed
line 15014 processed
line 15015 processed
line 15016 processed
line 15017 processed
line 15018 processed
line 15019 processed
line 15020 processed
line 15021 pr

line 15384 processed
line 15385 processed
line 15386 processed
line 15387 processed
line 15388 processed
line 15389 processed
line 15390 processed
line 15391 processed
line 15392 processed
line 15393 processed
line 15394 processed
line 15395 processed
line 15396 processed
line 15397 processed
line 15398 processed
line 15399 processed
line 15400 processed
line 15401 processed
line 15402 processed
line 15403 processed
line 15404 processed
line 15405 processed
line 15406 processed
line 15407 processed
line 15408 processed
line 15409 processed
line 15410 processed
line 15411 processed
line 15412 processed
line 15413 processed
line 15414 processed
line 15415 processed
line 15416 processed
line 15417 processed
line 15418 processed
line 15419 processed
line 15420 processed
line 15421 processed
line 15422 processed
line 15423 processed
line 15424 processed
line 15425 processed
line 15426 processed
line 15427 processed
line 15428 processed
line 15429 processed
line 15430 processed
line 15431 pr

line 15788 processed
line 15789 processed
line 15790 processed
line 15791 processed
line 15792 processed
line 15793 processed
line 15794 processed
line 15795 processed
line 15796 processed
line 15797 processed
line 15798 processed
line 15799 processed
line 15800 processed
line 15801 processed
line 15802 processed
line 15803 processed
line 15804 processed
line 15805 processed
line 15806 processed
line 15807 processed
line 15808 processed
line 15809 processed
line 15810 processed
line 15811 processed
line 15812 processed
line 15813 processed
line 15814 processed
line 15815 processed
line 15816 processed
line 15817 processed
line 15818 processed
line 15819 processed
line 15820 processed
line 15821 processed
line 15822 processed
line 15823 processed
line 15824 processed
line 15825 processed
line 15826 processed
line 15827 processed
line 15828 processed
line 15829 processed
line 15830 processed
line 15831 processed
line 15832 processed
line 15833 processed
line 15834 processed
line 15835 pr

line 16198 processed
line 16199 processed
line 16200 processed
line 16201 processed
line 16202 processed
line 16203 processed
line 16204 processed
line 16205 processed
line 16206 processed
line 16207 processed
line 16208 processed
line 16209 processed
line 16210 processed
line 16211 processed
line 16212 processed
line 16213 processed
line 16214 processed
line 16215 processed
line 16216 processed
line 16217 processed
line 16218 processed
line 16219 processed
line 16220 processed
line 16221 processed
line 16222 processed
line 16223 processed
line 16224 processed
line 16225 processed
line 16226 processed
line 16227 processed
line 16228 processed
line 16229 processed
line 16230 processed
line 16231 processed
line 16232 processed
line 16233 processed
line 16234 processed
line 16235 processed
line 16236 processed
line 16237 processed
line 16238 processed
line 16239 processed
line 16240 processed
line 16241 processed
line 16242 processed
line 16243 processed
line 16244 processed
line 16245 pr

line 16605 processed
line 16606 processed
line 16607 processed
line 16608 processed
line 16609 processed
line 16610 processed
line 16611 processed
line 16612 processed
line 16613 processed
line 16614 processed
line 16615 processed
line 16616 processed
line 16617 processed
line 16618 processed
line 16619 processed
line 16620 processed
line 16621 processed
line 16622 processed
line 16623 processed
line 16624 processed
line 16625 processed
line 16626 processed
line 16627 processed
line 16628 processed
line 16629 processed
line 16630 processed
line 16631 processed
line 16632 processed
line 16633 processed
line 16634 processed
line 16635 processed
line 16636 processed
line 16637 processed
line 16638 processed
line 16639 processed
line 16640 processed
line 16641 processed
line 16642 processed
line 16643 processed
line 16644 processed
line 16645 processed
line 16646 processed
line 16647 processed
line 16648 processed
line 16649 processed
line 16650 processed
line 16651 processed
line 16652 pr

line 16997 processed
line 16998 processed
line 16999 processed
line 17000 processed
line 17001 processed
line 17002 processed
line 17003 processed
line 17004 processed
line 17005 processed
line 17006 processed
line 17007 processed
line 17008 processed
line 17009 processed
line 17010 processed
line 17011 processed
line 17012 processed
line 17013 processed
line 17014 processed
line 17015 processed
line 17016 processed
line 17017 processed
line 17018 processed
line 17019 processed
line 17020 processed
line 17021 processed
line 17022 processed
line 17023 processed
line 17024 processed
line 17025 processed
line 17026 processed
line 17027 processed
line 17028 processed
line 17029 processed
line 17030 processed
line 17031 processed
line 17032 processed
line 17033 processed
line 17034 processed
line 17035 processed
line 17036 processed
line 17037 processed
line 17038 processed
line 17039 processed
line 17040 processed
line 17041 processed
line 17042 processed
line 17043 processed
line 17044 pr

line 17406 processed
line 17407 processed
line 17408 processed
line 17409 processed
line 17410 processed
line 17411 processed
line 17412 processed
line 17413 processed
line 17414 processed
line 17415 processed
line 17416 processed
line 17417 processed
line 17418 processed
line 17419 processed
line 17420 processed
line 17421 processed
line 17422 processed
line 17423 processed
line 17424 processed
line 17425 processed
line 17426 processed
line 17427 processed
line 17428 processed
line 17429 processed
line 17430 processed
line 17431 processed
line 17432 processed
line 17433 processed
line 17434 processed
line 17435 processed
line 17436 processed
line 17437 processed
line 17438 processed
line 17439 processed
line 17440 processed
line 17441 processed
line 17442 processed
line 17443 processed
line 17444 processed
line 17445 processed
line 17446 processed
line 17447 processed
line 17448 processed
line 17449 processed
line 17450 processed
line 17451 processed
line 17452 processed
line 17453 pr

line 17802 processed
line 17803 processed
line 17804 processed
line 17805 processed
line 17806 processed
line 17807 processed
line 17808 processed
line 17809 processed
line 17810 processed
line 17811 processed
line 17812 processed
line 17813 processed
line 17814 processed
line 17815 processed
line 17816 processed
line 17817 processed
line 17818 processed
line 17819 processed
line 17820 processed
line 17821 processed
line 17822 processed
line 17823 processed
line 17824 processed
line 17825 processed
line 17826 processed
line 17827 processed
line 17828 processed
line 17829 processed
line 17830 processed
line 17831 processed
line 17832 processed
line 17833 processed
line 17834 processed
line 17835 processed
line 17836 processed
line 17837 processed
line 17838 processed
line 17839 processed
line 17840 processed
line 17841 processed
line 17842 processed
line 17843 processed
line 17844 processed
line 17845 processed
line 17846 processed
line 17847 processed
line 17848 processed
line 17849 pr

line 18193 processed
line 18194 processed
line 18195 processed
line 18196 processed
line 18197 processed
line 18198 processed
line 18199 processed
line 18200 processed
line 18201 processed
line 18202 processed
line 18203 processed
line 18204 processed
line 18205 processed
line 18206 processed
line 18207 processed
line 18208 processed
line 18209 processed
line 18210 processed
line 18211 processed
line 18212 processed
line 18213 processed
line 18214 processed
line 18215 processed
line 18216 processed
line 18217 processed
line 18218 processed
line 18219 processed
line 18220 processed
line 18221 processed
line 18222 processed
line 18223 processed
line 18224 processed
line 18225 processed
line 18226 processed
line 18227 processed
line 18228 processed
line 18229 processed
line 18230 processed
line 18231 processed
line 18232 processed
line 18233 processed
line 18234 processed
line 18235 processed
line 18236 processed
line 18237 processed
line 18238 processed
line 18239 processed
line 18240 pr

line 18590 processed
line 18591 processed
line 18592 processed
line 18593 processed
line 18594 processed
line 18595 processed
line 18596 processed
line 18597 processed
line 18598 processed
line 18599 processed
line 18600 processed
line 18601 processed
line 18602 processed
line 18603 processed
line 18604 processed
line 18605 processed
line 18606 processed
line 18607 processed
line 18608 processed
line 18609 processed
line 18610 processed
line 18611 processed
line 18612 processed
line 18613 processed
line 18614 processed
line 18615 processed
line 18616 processed
line 18617 processed
line 18618 processed
line 18619 processed
line 18620 processed
line 18621 processed
line 18622 processed
line 18623 processed
line 18624 processed
line 18625 processed
line 18626 processed
line 18627 processed
line 18628 processed
line 18629 processed
line 18630 processed
line 18631 processed
line 18632 processed
line 18633 processed
line 18634 processed
line 18635 processed
line 18636 processed
line 18637 pr

line 18983 processed
line 18984 processed
line 18985 processed
line 18986 processed
line 18987 processed
line 18988 processed
line 18989 processed
line 18990 processed
line 18991 processed
line 18992 processed
line 18993 processed
line 18994 processed
line 18995 processed
line 18996 processed
line 18997 processed
line 18998 processed
line 18999 processed
line 19000 processed
line 19001 processed
line 19002 processed
line 19003 processed
line 19004 processed
line 19005 processed
line 19006 processed
line 19007 processed
line 19008 processed
line 19009 processed
line 19010 processed
line 19011 processed
line 19012 processed
line 19013 processed
line 19014 processed
line 19015 processed
line 19016 processed
line 19017 processed
line 19018 processed
line 19019 processed
line 19020 processed
line 19021 processed
line 19022 processed
line 19023 processed
line 19024 processed
line 19025 processed
line 19026 processed
line 19027 processed
line 19028 processed
line 19029 processed
line 19030 pr

line 19377 processed
line 19378 processed
line 19379 processed
line 19380 processed
line 19381 processed
line 19382 processed
line 19383 processed
line 19384 processed
line 19385 processed
line 19386 processed
line 19387 processed
line 19388 processed
line 19389 processed
line 19390 processed
line 19391 processed
line 19392 processed
line 19393 processed
line 19394 processed
line 19395 processed
line 19396 processed
line 19397 processed
line 19398 processed
line 19399 processed
line 19400 processed
line 19401 processed
line 19402 processed
line 19403 processed
line 19404 processed
line 19405 processed
line 19406 processed
line 19407 processed
line 19408 processed
line 19409 processed
line 19410 processed
line 19411 processed
line 19412 processed
line 19413 processed
line 19414 processed
line 19415 processed
line 19416 processed
line 19417 processed
line 19418 processed
line 19419 processed
line 19420 processed
line 19421 processed
line 19422 processed
line 19423 processed
line 19424 pr

line 19780 processed
line 19781 processed
line 19782 processed
line 19783 processed
line 19784 processed
line 19785 processed
line 19786 processed
line 19787 processed
line 19788 processed
line 19789 processed
line 19790 processed
line 19791 processed
line 19792 processed
line 19793 processed
line 19794 processed
line 19795 processed
line 19796 processed
line 19797 processed
line 19798 processed
line 19799 processed
line 19800 processed
line 19801 processed
line 19802 processed
line 19803 processed
line 19804 processed
line 19805 processed
line 19806 processed
line 19807 processed
line 19808 processed
line 19809 processed
line 19810 processed
line 19811 processed
line 19812 processed
line 19813 processed
line 19814 processed
line 19815 processed
line 19816 processed
line 19817 processed
line 19818 processed
line 19819 processed
line 19820 processed
line 19821 processed
line 19822 processed
line 19823 processed
line 19824 processed
line 19825 processed
line 19826 processed
line 19827 pr

line 20192 processed
line 20193 processed
line 20194 processed
line 20195 processed
line 20196 processed
line 20197 processed
line 20198 processed
line 20199 processed
line 20200 processed
line 20201 processed
line 20202 processed
line 20203 processed
line 20204 processed
line 20205 processed
line 20206 processed
line 20207 processed
line 20208 processed
line 20209 processed
line 20210 processed
line 20211 processed
line 20212 processed
line 20213 processed
line 20214 processed
line 20215 processed
line 20216 processed
line 20217 processed
line 20218 processed
line 20219 processed
line 20220 processed
line 20221 processed
line 20222 processed
line 20223 processed
line 20224 processed
line 20225 processed
line 20226 processed
line 20227 processed
line 20228 processed
line 20229 processed
line 20230 processed
line 20231 processed
line 20232 processed
line 20233 processed
line 20234 processed
line 20235 processed
line 20236 processed
line 20237 processed
line 20238 processed
line 20239 pr

line 20602 processed
line 20603 processed
line 20604 processed
line 20605 processed
line 20606 processed
line 20607 processed
line 20608 processed
line 20609 processed
line 20610 processed
line 20611 processed
line 20612 processed
line 20613 processed
line 20614 processed
line 20615 processed
line 20616 processed
line 20617 processed
line 20618 processed
line 20619 processed
line 20620 processed
line 20621 processed
line 20622 processed
line 20623 processed
line 20624 processed
line 20625 processed
line 20626 processed
line 20627 processed
line 20628 processed
line 20629 processed
line 20630 processed
line 20631 processed
line 20632 processed
line 20633 processed
line 20634 processed
line 20635 processed
line 20636 processed
line 20637 processed
line 20638 processed
line 20639 processed
line 20640 processed
line 20641 processed
line 20642 processed
line 20643 processed
line 20644 processed
line 20645 processed
line 20646 processed
line 20647 processed
line 20648 processed
line 20649 pr

line 21004 processed
line 21005 processed
line 21006 processed
line 21007 processed
line 21008 processed
line 21009 processed
line 21010 processed
line 21011 processed
line 21012 processed
line 21013 processed
line 21014 processed
line 21015 processed
line 21016 processed
line 21017 processed
line 21018 processed
line 21019 processed
line 21020 processed
line 21021 processed
line 21022 processed
line 21023 processed
line 21024 processed
line 21025 processed
line 21026 processed
line 21027 processed
line 21028 processed
line 21029 processed
line 21030 processed
line 21031 processed
line 21032 processed
line 21033 processed
line 21034 processed
line 21035 processed
line 21036 processed
line 21037 processed
line 21038 processed
line 21039 processed
line 21040 processed
line 21041 processed
line 21042 processed
line 21043 processed
line 21044 processed
line 21045 processed
line 21046 processed
line 21047 processed
line 21048 processed
line 21049 processed
line 21050 processed
line 21051 pr

line 21407 processed
line 21408 processed
line 21409 processed
line 21410 processed
line 21411 processed
line 21412 processed
line 21413 processed
line 21414 processed
line 21415 processed
line 21416 processed
line 21417 processed
line 21418 processed
line 21419 processed
line 21420 processed
line 21421 processed
line 21422 processed
line 21423 processed
line 21424 processed
line 21425 processed
line 21426 processed
line 21427 processed
line 21428 processed
line 21429 processed
line 21430 processed
line 21431 processed
line 21432 processed
line 21433 processed
line 21434 processed
line 21435 processed
line 21436 processed
line 21437 processed
line 21438 processed
line 21439 processed
line 21440 processed
line 21441 processed
line 21442 processed
line 21443 processed
line 21444 processed
line 21445 processed
line 21446 processed
line 21447 processed
line 21448 processed
line 21449 processed
line 21450 processed
line 21451 processed
line 21452 processed
line 21453 processed
line 21454 pr

line 21808 processed
line 21809 processed
line 21810 processed
line 21811 processed
line 21812 processed
line 21813 processed
line 21814 processed
line 21815 processed
line 21816 processed
line 21817 processed
line 21818 processed
line 21819 processed
line 21820 processed
line 21821 processed
line 21822 processed
line 21823 processed
line 21824 processed
line 21825 processed
line 21826 processed
line 21827 processed
line 21828 processed
line 21829 processed
line 21830 processed
line 21831 processed
line 21832 processed
line 21833 processed
line 21834 processed
line 21835 processed
line 21836 processed
line 21837 processed
line 21838 processed
line 21839 processed
line 21840 processed
line 21841 processed
line 21842 processed
line 21843 processed
line 21844 processed
line 21845 processed
line 21846 processed
line 21847 processed
line 21848 processed
line 21849 processed
line 21850 processed
line 21851 processed
line 21852 processed
line 21853 processed
line 21854 processed
line 21855 pr

line 22213 processed
line 22214 processed
line 22215 processed
line 22216 processed
line 22217 processed
line 22218 processed
line 22219 processed
line 22220 processed
line 22221 processed
line 22222 processed
line 22223 processed
line 22224 processed
line 22225 processed
line 22226 processed
line 22227 processed
line 22228 processed
line 22229 processed
line 22230 processed
line 22231 processed
line 22232 processed
line 22233 processed
line 22234 processed
line 22235 processed
line 22236 processed
line 22237 processed
line 22238 processed
line 22239 processed
line 22240 processed
line 22241 processed
line 22242 processed
line 22243 processed
line 22244 processed
line 22245 processed
line 22246 processed
line 22247 processed
line 22248 processed
line 22249 processed
line 22250 processed
line 22251 processed
line 22252 processed
line 22253 processed
line 22254 processed
line 22255 processed
line 22256 processed
line 22257 processed
line 22258 processed
line 22259 processed
line 22260 pr

line 22607 processed
line 22608 processed
line 22609 processed
line 22610 processed
line 22611 processed
line 22612 processed
line 22613 processed
line 22614 processed
line 22615 processed
line 22616 processed
line 22617 processed
line 22618 processed
line 22619 processed
line 22620 processed
line 22621 processed
line 22622 processed
line 22623 processed
line 22624 processed
line 22625 processed
line 22626 processed
line 22627 processed
line 22628 processed
line 22629 processed
line 22630 processed
line 22631 processed
line 22632 processed
line 22633 processed
line 22634 processed
line 22635 processed
line 22636 processed
line 22637 processed
line 22638 processed
line 22639 processed
line 22640 processed
line 22641 processed
line 22642 processed
line 22643 processed
line 22644 processed
line 22645 processed
line 22646 processed
line 22647 processed
line 22648 processed
line 22649 processed
line 22650 processed
line 22651 processed
line 22652 processed
line 22653 processed
line 22654 pr

line 23022 processed
line 23023 processed
line 23024 processed
line 23025 processed
line 23026 processed
line 23027 processed
line 23028 processed
line 23029 processed
line 23030 processed
line 23031 processed
line 23032 processed
line 23033 processed
line 23034 processed
line 23035 processed
line 23036 processed
line 23037 processed
line 23038 processed
line 23039 processed
line 23040 processed
line 23041 processed
line 23042 processed
line 23043 processed
line 23044 processed
line 23045 processed
line 23046 processed
line 23047 processed
line 23048 processed
line 23049 processed
line 23050 processed
line 23051 processed
line 23052 processed
line 23053 processed
line 23054 processed
line 23055 processed
line 23056 processed
line 23057 processed
line 23058 processed
line 23059 processed
line 23060 processed
line 23061 processed
line 23062 processed
line 23063 processed
line 23064 processed
line 23065 processed
line 23066 processed
line 23067 processed
line 23068 processed
line 23069 pr

line 23416 processed
line 23417 processed
line 23418 processed
line 23419 processed
line 23420 processed
line 23421 processed
line 23422 processed
line 23423 processed
line 23424 processed
line 23425 processed
line 23426 processed
line 23427 processed
line 23428 processed
line 23429 processed
line 23430 processed
line 23431 processed
line 23432 processed
line 23433 processed
line 23434 processed
line 23435 processed
line 23436 processed
line 23437 processed
line 23438 processed
line 23439 processed
line 23440 processed
line 23441 processed
line 23442 processed
line 23443 processed
line 23444 processed
line 23445 processed
line 23446 processed
line 23447 processed
line 23448 processed
line 23449 processed
line 23450 processed
line 23451 processed
line 23452 processed
line 23453 processed
line 23454 processed
line 23455 processed
line 23456 processed
line 23457 processed
line 23458 processed
line 23459 processed
line 23460 processed
line 23461 processed
line 23462 processed
line 23463 pr

line 23829 processed
line 23830 processed
line 23831 processed
line 23832 processed
line 23833 processed
line 23834 processed
line 23835 processed
line 23836 processed
line 23837 processed
line 23838 processed
line 23839 processed
line 23840 processed
line 23841 processed
line 23842 processed
line 23843 processed
line 23844 processed
line 23845 processed
line 23846 processed
line 23847 processed
line 23848 processed
line 23849 processed
line 23850 processed
line 23851 processed
line 23852 processed
line 23853 processed
line 23854 processed
line 23855 processed
line 23856 processed
line 23857 processed
line 23858 processed
line 23859 processed
line 23860 processed
line 23861 processed
line 23862 processed
line 23863 processed
line 23864 processed
line 23865 processed
line 23866 processed
line 23867 processed
line 23868 processed
line 23869 processed
line 23870 processed
line 23871 processed
line 23872 processed
line 23873 processed
line 23874 processed
line 23875 processed
line 23876 pr

line 24229 processed
line 24230 processed
line 24231 processed
line 24232 processed
line 24233 processed
line 24234 processed
line 24235 processed
line 24236 processed
line 24237 processed
line 24238 processed
line 24239 processed
line 24240 processed
line 24241 processed
line 24242 processed
line 24243 processed
line 24244 processed
line 24245 processed
line 24246 processed
line 24247 processed
line 24248 processed
line 24249 processed
line 24250 processed
line 24251 processed
line 24252 processed
line 24253 processed
line 24254 processed
line 24255 processed
line 24256 processed
line 24257 processed
line 24258 processed
line 24259 processed
line 24260 processed
line 24261 processed
line 24262 processed
line 24263 processed
line 24264 processed
line 24265 processed
line 24266 processed
line 24267 processed
line 24268 processed
line 24269 processed
line 24270 processed
line 24271 processed
line 24272 processed
line 24273 processed
line 24274 processed
line 24275 processed
line 24276 pr

line 24630 processed
line 24631 processed
line 24632 processed
line 24633 processed
line 24634 processed
line 24635 processed
line 24636 processed
line 24637 processed
line 24638 processed
line 24639 processed
line 24640 processed
line 24641 processed
line 24642 processed
line 24643 processed
line 24644 processed
line 24645 processed
line 24646 processed
line 24647 processed
line 24648 processed
line 24649 processed
line 24650 processed
line 24651 processed
line 24652 processed
line 24653 processed
line 24654 processed
line 24655 processed
line 24656 processed
line 24657 processed
line 24658 processed
line 24659 processed
line 24660 processed
line 24661 processed
line 24662 processed
line 24663 processed
line 24664 processed
line 24665 processed
line 24666 processed
line 24667 processed
line 24668 processed
line 24669 processed
line 24670 processed
line 24671 processed
line 24672 processed
line 24673 processed
line 24674 processed
line 24675 processed
line 24676 processed
line 24677 pr

First, we separate the text column into its own numpy array. Then, we convert the numpy array into a list. This puts our data into a universal list format for when we clean the data.

Next, we initialize variables we need and start looping over the individual text descriptions.

We start by using regular expressions to get rid of punctuation and symbols, and we also make all of the text lower case. This makes sure repeated words are recognized in our text.

Next, we split our description string into words by splitting on the spaces between words. This allows us to modify individual words in our text.

Then, we remove stop words from our text and stem all our words. Stop words are words such as "and", "the", and "of" which do not have a lot of sentimental value to our machine learning. You can think of it as removing noise. However, you have to be careful. Removing stop words on a dataset with short phrases such as "On the blue river" and "Over the mountain" can cause a loss of important information.

When we stem all the words we are removing irrelevant part of words such as prefixes and suffixes which are similar. We use the Porter Stemmer, which is well known to be a good solution to stemming words. Stemming also tends to reduce the amount of words in our bag of words model later on, which can help lower the dimensionality of our input data. However, we need to be careful to not stem useful information like with removing stop words.

Finally, we join the modified words back together with spaces and keep track of our progress.

Next, we will vectorize our text descriptions and add our text features back into our input data.

In [7]:
text_array = np.array(text_list)
data_array = data.values

tfidf = TfidfVectorizer()
text_vect = tfidf.fit_transform(text_array)

print(data_array.shape)
print(text_vect.shape)

full_data = np.concatenate((text_vect.toarray(), data_array), axis=1)

print(full_data.shape)

(25000, 1649)
(25000, 16728)
(25000, 18377)


First, we convert the main dataframe into a numpy array and the modified text list we made into a numpy array. Then, we use sklearn's term frequency vectorizer to vectorize our text data. Rather than simply counting the words in each description, we weight words based on term frequency. This vectorizer keeps less useful repeated words from overshadowing more useful unique words.

Then, we append our vectorized text data onto our input data and observe the new shape of the data. The feature count may look very large compared to the number of examples. However, this is due to us having to cut down the number of examples from 150,000 to 25,000. The final feature count with 150,000 examples is only around 31,000, which suggests a good amount of repeated words.

You can find more information about tf-idf term weighting here: http://scikit-learn.org/stable/modules/feature_extraction.html

Next, we will split our data into train and test data concat the input data and labels together to fit EMADE's format.

In [8]:
X_train, X_test, y_train, y_test = train_test_split(full_data, labels.values, test_size=0.33)

data_train = np.concatenate((X_train, y_train), axis=1)
data_test = np.concatenate((X_test, y_test), axis=1)

print(data_train.shape)
print(data_test.shape)

(16750, 18378)
(8250, 18378)


We split the data into training and test data with 67% as training and 33% as testing data. Then, we append the labels columns to the end of the input data columns to put the data into the format EMADE expects.

Next, we will test out the performance of our dataset on a logistic regression classifier to get a rough benchmark. Logistic Regression works well on text datasets and any datasets with a lot of discrete and/or Boolean (1,0) values.

In [9]:
classifier = LogisticRegression()

classifier.fit(X_train, np.ravel(y_train))

predicted = classifier.predict(X_test)

print("Classification report for classifier %s:\n%s\n"
      % (classifier, metrics.classification_report(y_test, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(y_test, predicted))

Classification report for classifier LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False):
             precision    recall  f1-score   support

        0.0       0.89      0.96      0.93      6804
        1.0       0.72      0.46      0.56      1446

avg / total       0.86      0.87      0.86      8250


Confusion matrix:
[[6547  257]
 [ 785  661]]


Next, we will convert our train and test data back into pandas dataframes and export it in the proper format for EMADE.

In [10]:
train = pd.DataFrame(data_train)
test = pd.DataFrame(data_test)

divisor_train = math.ceil(len(train) / 5)
divisor_test = math.ceil(len(test) / 5)

count = 0
for g, df in train.groupby(np.arange(len(train)) // divisor_train):
    print(df.shape)

    np.savetxt("datasets/wine_data_set_train_%i.txt" % count, df.values, fmt='%.5f', delimiter=",")

    with open("datasets/wine_data_set_train_%i.txt" % count, "rb") as f_in, gzip.open("datasets/wine_data_set_train_%i.dat.gz" % count, "wb") as f_out:
        shutil.copyfileobj(f_in, f_out)
        
    os.remove("datasets/wine_data_set_train_%i.txt" % count)

    count += 1
    
count = 0
for g, df in test.groupby(np.arange(len(test)) // divisor_test):
    print(df.shape)

    np.savetxt("datasets/wine_data_set_test_%i.txt" % count, df.values, fmt='%.5f', delimiter=",")

    with open("datasets/wine_data_set_test_%i.txt" % count, "rb") as f_in, gzip.open("datasets/wine_data_set_test_%i.dat.gz" % count, "wb") as f_out:
        shutil.copyfileobj(f_in, f_out)
        
    os.remove("datasets/wine_data_set_test_%i.txt" % count)

    count += 1
    
small_train = train.drop(train.index[1675:])
small_test = test.drop(test.index[825:])

np.savetxt("datasets/wine_data_set_train_small.txt", small_train.values, fmt='%.5f', delimiter=",")

with open("datasets/wine_data_set_train_small.txt", "rb") as f_in, gzip.open("datasets/wine_data_set_train_small.dat.gz", "wb") as f_out:
    shutil.copyfileobj(f_in, f_out)
    
os.remove("datasets/wine_data_set_train_small.txt")
              
np.savetxt("datasets/wine_data_set_test_small.txt", small_train.values, fmt='%.5f', delimiter=",")

with open("datasets/wine_data_set_test_small.txt", "rb") as f_in, gzip.open("datasets/wine_data_set_test_small.dat.gz", "wb") as f_out:
    shutil.copyfileobj(f_in, f_out)
    
os.remove("datasets/wine_data_set_test_small.txt")

(3350, 18378)
(3350, 18378)
(3350, 18378)
(3350, 18378)
(3350, 18378)
(1650, 18378)
(1650, 18378)
(1650, 18378)
(1650, 18378)
(1650, 18378)


We split the dataset into 5 chunks each with 20% of the data. Then, we set aside one chunk of 10% to be our small dataset in EMADE.

The final step for preprocessing is to compress all the .dat files into .dat.gz files and place them in a folder under the datasets directory of EMADE.

There are other datasets already implemented into EMADE in this format.

# XML Formatting

Below is an example xml template for the dataset we preprocessed. This template is used to setup parameters for EMADE. You can change the objectives, crossover probability, and mutation probability from here.

Make sure to make an xml file and store it in the templates directory when you implement a new dataset into EMADE.

## Note: Do NOT run the code cell below

In [11]:
<?xml version="1.0"?>

<input>

    <datasets>
        <dataset>
            <name>SmallDataSet</name>
            <type>featuredata</type>
            <MonteCarlo>
                <trial>
                    <trainFilename>datasets/wine/wine_data_set_train_small.dat.gz</trainFilename>
                    <testFilename>datasets/wine/wine_data_set_test_small.dat.gz</testFilename>
                </trial>
            </MonteCarlo>
        </dataset>
        <dataset>
            <name>FullDataSet</name>
            <type>featuredata</type>
            <MonteCarlo>
                <trial>
                    <trainFilename>datasets/wine/wine_data_set_train_0.dat.gz</trainFilename>
                    <testFilename>datasets/wine/wine_data_set_test_0.dat.gz</testFilename>
                </trial>
                <trial>
                    <trainFilename>datasets/wine/wine_data_set_train_1.dat.gz</trainFilename>
                    <testFilename>datasets/wine/wine_data_set_test_1.dat.gz</testFilename>
                </trial>
                <trial>
                    <trainFilename>datasets/wine/wine_data_set_train_2.dat.gz</trainFilename>
                    <testFilename>datasets/wine/wine_data_set_test_2.dat.gz</testFilename>
                </trial>
                <trial>
                    <trainFilename>datasets/wine/wine_data_set_train_3.dat.gz</trainFilename>
                    <testFilename>datasets/wine/wine_data_set_test_3.dat.gz</testFilename>
                </trial>
                <trial>
                    <trainFilename>datasets/wine/wine_data_set_train_4.dat.gz</trainFilename>
                    <testFilename>datasets/wine/wine_data_set_test_4.dat.gz</testFilename>
                </trial>
            </MonteCarlo>
        </dataset>
    </datasets>

    <objectives>
        <objective>
            <name>False Positives</name>
            <weight>-1.0</weight>
            <achievable>4971.8</achievable>
            <goal>0</goal>
            <evaluationFunction>false_positive</evaluationFunction>
            <lower>0</lower>
            <upper>1</upper>
        </objective>
        <objective>
            <name>False Negatives</name>
            <weight>-1.0</weight>
            <achievable>1541.2</achievable>
            <goal>0</goal>
            <evaluationFunction>false_negative</evaluationFunction>
            <lower>0</lower>
            <upper>1</upper>
        </objective>
        <objective>
            <name>F1-Score</name>
            <weight>-1.0</weight>
            <achievable>0.2</achievable>
            <goal>0</goal>
            <evaluationFunction>f1_score_min</evaluationFunction>
            <lower>0</lower>
            <upper>1</upper>
        </objective>
        <objective>
            <name>Num Elements</name>
            <weight>-1.0</weight>
            <achievable>100.0</achievable>
            <goal>0</goal>
            <evaluationFunction>num_elements_eval_function</evaluationFunction>
            <lower>0</lower>
            <upper>1</upper>
        </objective>

    </objectives>

    <evaluation>
        <module>evalFunctions</module>
        <memoryLimit>30</memoryLimit> <!-- In Percent -->
    </evaluation>

    <scoopParameters>
        <host>
            <name>localhost</name>
            <workers>24</workers>
        </host>
        <!--<host>
            <name>localhost</name>
            <workers>3</workers>
        </host>
        <host>
            <name>localhost</name>
            <workers>3</workers>
        </host>
        <host>
            <name>localhost</name>
            <workers>3</workers>
        </host>
        <host>
            <name>localhost</name>
            <workers>3</workers>
        </host>
        <host>
            <name>localhost</name>
            <workers>3</workers>
        </host>-->
    </scoopParameters>

    <evolutionParameters>
        <initialPopulationSize>512</initialPopulationSize>
        <elitePoolSize>512</elitePoolSize>
        <launchSize>300</launchSize>
        <minQueueSize>200</minQueueSize>

        <matings>
            <mating>
                <name>crossover</name>
                <probability>0.50</probability>
            </mating>
            <mating>
                <name>crossoverEphemeral</name>
                <probability>0.50</probability>
            </mating>
            <mating>
                <name>headlessChicken</name>
                <probability>0.10</probability>
            </mating>
            <mating>
                <name>headlessChickenEphemeral</name>
                <probability>0.10</probability>
            </mating>
        </matings>

        <mutations>
            <mutation>
                <name>insert</name>
                <probability>0.05</probability>
            </mutation>
            <mutation>
                <name>insert modify</name>
                <probability>0.10</probability>
            </mutation>
            <mutation>
                <name>ephemeral</name>
                <probability>0.25</probability>
            </mutation>
            <mutation>
                <name>node replace</name>
                <probability>0.05</probability>
            </mutation>
            <mutation>
                <name>uniform</name>
                <probability>0.05</probability>
            </mutation>
            <mutation>
                <name>shrink</name>
                <probability>0.05</probability>
            </mutation>
        </mutations>

        <selections>
            <selection>
                <name>NSGA2</name>
            </selection>
        </selections>

    </evolutionParameters>

    <seedFile>

    </seedFile>

    <genePoolFitness>
        <prefix>genePoolFitnessWine</prefix>
    </genePoolFitness>
    <paretoFitness>
        <prefix>paretoFitnessWine</prefix>
    </paretoFitness>
    <parentsOutput>
        <prefix>parentsWine</prefix>
    </parentsOutput>



    <paretoOutput>
        <prefix>paretoFrontWine</prefix>
    </paretoOutput>
</input>

SyntaxError: invalid syntax (<ipython-input-11-ef8ab20c933c>, line 1)

In [None]:
import nltk

In [None]:
nltk.download()