# Python Beginners Workshop - Day3

# Session 1: Pandas

## Learning Goals

- What is Pandas? and Why?
- loading data and Creating Pandas Series and DataFrame objects
- Data Manipulation (values, index, and columns)
- Data Selection (indexing, masking, and splitting)
- Data Transformation
- <font color="red">Dealing with missing data</font>
- Combinning DataFrames
- Split-Apply-Combine
- Saving Data 

What's missing:
- <font color="red">bin rows based on the value of one attribute (`pandas.cut`)</font>

---

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Motivation

In [2]:
an_arr = np.arange(20).reshape(5, -1)
an_arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [3]:
pd.DataFrame(an_arr, 
             columns=['A', 'B', 'C', 'D'])

Unnamed: 0,A,B,C,D
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15
4,16,17,18,19


## What is Pandas? And Why?

**pandas** is a Python package providing fast, flexible, and expressive data structures designed to work with *relational* or *labeled* data both. It stores data in a tabular form. Each table, which is sometimes called a relation, in a relational database contains one or more data categories in columns, also called attributes. Each row, also called a record or tuple, contains a unique instance of data, or key, for the categories defined by the columns.

It is a fundamental high-level building block for doing practical, real world data analysis in Python. 

pandas is well suited for:

- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
- Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure


Key features:
    
- Easy handling of **missing data**
- **Size mutability**: columns can be inserted and deleted from DataFrame and higher dimensional objects
- Automatic and explicit **data alignment**: objects can be explicitly aligned to a set of labels, or the data can be aligned automatically
- Powerful, flexible **group by functionality** to perform split-apply-combine operations on data sets
- Intelligent label-based **slicing, fancy indexing, and subsetting** of large data sets
- Intuitive **merging and joining** data sets
- Flexible **reshaping and pivoting** of data sets
- **Hierarchical labeling** of axes
- Robust **IO tools** for loading data from flat files, Excel files, databases, and HDF5
- **Time series functionality**: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

### Pandas Series

Pandas Series is a generalized of one-dimensional array with flexible indexing (one index per element in the column):

In [4]:
ss = pd.Series([1, 2, 3, 4])
ss

0    1
1    2
2    3
3    4
dtype: int64

If an index is not specified, a default sequence of integers is assigned as the index. A NumPy array comprises the values of the `Series`, while the index is a pandas `Index` object.

In [5]:
ss = pd.Series([25, 30, 33, 28], 
               index=['Mohammad', 'Nick', 'Hana', 'Anna'])
ss

Mohammad    25
Nick        30
Hana        33
Anna        28
dtype: int64

In [6]:
ss = pd.Series({'Mohammad': 25, 
                'Nick':30, 
                'Hana':33, 
                'Anna': 28})
ss

Mohammad    25
Nick        30
Hana        33
Anna        28
dtype: int64

These labels can be used to refer to the values in the `Series`.

In [7]:
ss['Mohammad']

25

In [8]:
ss[['Nick', 'Hana', 'Anna']]

Nick    30
Hana    33
Anna    28
dtype: int64

In [9]:
ss['Nick':'Anna']

Nick    30
Hana    33
Anna    28
dtype: int64

In [10]:
ss['Nick':'Anna':2]

Nick    30
Anna    28
dtype: int64

In [11]:
ss[[False, True, True, True]]

Nick    30
Hana    33
Anna    28
dtype: int64

Notice that the indexing operation preserved the association between the values and the corresponding indices.

We can still use positional indexing if we wish.

### Pandas DataFrame

Inevitably, we want to be able to store, view and manipulate data that is *multivariate*, where for every index there are multiple fields or columns of data, often of varying data type.

A `DataFrame` is a tabular data structure, encapsulating multiple series like columns in a spreadsheet. Pandas `DataFrame` is a generalization of two-dimensional array wit flexible indexing. We can create DataFrame directly from a dictionary, or from series:

In [12]:
age = {'Mohammad': 25, 'Nick':30, 'Hana':33, 'Anna': 28}
weight = {'Mohammad': 90, 'Nick': 80,'Hana': 50, 'Anna':55}

In [25]:
df = pd.DataFrame({'age': age, 'weight': weight})

In [26]:
df

Unnamed: 0,age,weight
Anna,28,55
Hana,33,50
Mohammad,25,90
Nick,30,80


In [27]:
df.index

Index(['Anna', 'Hana', 'Mohammad', 'Nick'], dtype='object')

In [28]:
df.columns

Index(['age', 'weight'], dtype='object')

### Other ways of creating Pandas DataFrame

Directly from a dictionary

In [29]:
df = pd.DataFrame({'age': [25, 30, 33, 28], 
                   'weight': [90, 80, 50, 55]})
df

Unnamed: 0,age,weight
0,25,90
1,30,80
2,33,50
3,28,55


In [30]:
df = pd.DataFrame({'age': [25, 30, 33, 28], 
                   'weight': [90, 80, 50, 55]}, 
                  index=['Mohammad', 'Nick', 'Hana', 'Anna'])
df

Unnamed: 0,age,weight
Mohammad,25,90
Nick,30,80
Hana,33,50
Anna,28,55


In [31]:
df = pd.DataFrame([[25, 90], 
                   [30, 80], 
                   [33, 50], 
                   [28, 55]], 
                  columns= ['Age', 'Weight'],
                  index=['Mohammad', 'Nick', 'Hana', 'Anna'])
df

Unnamed: 0,Age,Weight
Mohammad,25,90
Nick,30,80
Hana,33,50
Anna,28,55


### Import data from csv file

In [32]:
pd.read_csv("csv_sample.csv", index_col=0)

Unnamed: 0,A,B,C
D,1,4,7
E,2,5,8
F,3,6,9


### Import data from Exel sheet

**NOTE**: We need to install a library called `xlrd`.

In [33]:
pd.read_excel("excel_sample.xlsx")

Unnamed: 0.1,Unnamed: 0,A,B,C
0,D,1,4,7
1,E,2,5,8
2,F,3,6,9


### Import data from URL

 <font color="red">Discuss with Nick about a reasonable data to direcctly import from the net and work on</font> 

In [34]:
df15 = pd.read_csv("data/happiness/2015.csv", index_col=0)
df16 = pd.read_csv("data/happiness/2016.csv", index_col=0)
df17 = pd.read_csv("data/happiness/2017.csv", index_col=0)

In [55]:
df15.drop(columns='Standard Error', inplace=True)

In [49]:
df16.drop(columns=['Lower Confidence Interval', 'Upper Confidence Interval'], inplace=True)

In [60]:
df17.drop(columns=['Whisker.high', 'Whisker.low'], inplace=True)

In [56]:
df15.columns.isin(df16.columns)

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True])

In [63]:
df15['Year'] = 2015
df16['Year'] = 2016

In [65]:
df = pd.concat([df15, df16])

In [68]:
df.tail()

Unnamed: 0_level_0,Region,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Year
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Benin,Sub-Saharan Africa,153,3.484,0.39499,0.10419,0.21028,0.39747,0.06681,0.2018,2.10812,2016
Afghanistan,Southern Asia,154,3.36,0.38227,0.11037,0.17344,0.1643,0.07112,0.31268,2.14558,2016
Togo,Sub-Saharan Africa,155,3.303,0.28123,0.0,0.24811,0.34678,0.11587,0.17517,2.1354,2016
Syria,Middle East and Northern Africa,156,3.069,0.74719,0.14866,0.62994,0.06912,0.17233,0.48397,0.81789,2016
Burundi,Sub-Saharan Africa,157,2.905,0.06831,0.23442,0.15747,0.0432,0.09419,0.2029,2.10404,2016


In [69]:
df.drop(columns='Region', inplace=True)

In [76]:
df.columns

Index(['Happiness Rank', 'Happiness Score', 'Economy (GDP per Capita)',
       'Family', 'Health (Life Expectancy)', 'Freedom',
       'Trust (Government Corruption)', 'Generosity', 'Dystopia Residual',
       'Year'],
      dtype='object')

In [77]:
df17['Year'] = 2017

In [87]:
" ".join(['Hello', 'Bye'])

'Hello Bye'

In [94]:
old_col = list(df17.columns)
new_col = [' '.join(el.split('.')) for el in list(df17.columns)]

In [95]:
new_col[2] = 'Economy (GDP per Capita)'
new_col[4] = 'Health (Life Expectancy)'
new_col[7] = 'Trust (Government Corruption)'
new_col

['Happiness Rank',
 'Happiness Score',
 'Economy (GDP per Capita)',
 'Family',
 'Health (Life Expectancy)',
 'Freedom',
 'Generosity',
 'Trust (Government Corruption)',
 'Dystopia Residual',
 'Year']

In [96]:
old_col

['Happiness.Rank',
 'Happiness.Score',
 'Economy..GDP.per.Capita.',
 'Family',
 'Health..Life.Expectancy.',
 'Freedom',
 'Generosity',
 'Trust..Government.Corruption.',
 'Dystopia.Residual',
 'Year']

In [99]:
new_col_dict = {el_old: el_new for el_new, el_old in zip(new_col, old_col)}

In [100]:
new_col_dict

{'Happiness.Rank': 'Happiness Rank',
 'Happiness.Score': 'Happiness Score',
 'Economy..GDP.per.Capita.': 'Economy (GDP per Capita)',
 'Family': 'Family',
 'Health..Life.Expectancy.': 'Health (Life Expectancy)',
 'Freedom': 'Freedom',
 'Generosity': 'Generosity',
 'Trust..Government.Corruption.': 'Trust (Government Corruption)',
 'Dystopia.Residual': 'Dystopia Residual',
 'Year': 'Year'}

In [102]:
df17.rename(columns=new_col_dict, inplace=True)

In [103]:
df17.columns.isin(df.columns)

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True])

## Writing Data to Files

In [167]:
new_dd.to_csv("hello.csv")

We have already seen `.index` as well as `.column`. However, Pandas provides some more handy methods for us to inspect our DataFrame:

---

### Exercise
1. Create a 20-by-3 pandas DataFrame using Numpy random module, and
2. Try out these commands to see what they return:
    - `data.head()`
    - `data.tail(3)`
    - `data.shape`

---

In [43]:
dd.head(10)

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,
843786,M,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,0.2087,...,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244,
844359,M,18.25,19.98,119.6,1040.0,0.09463,0.109,0.1127,0.074,0.1794,...,27.66,153.2,1606.0,0.1442,0.2576,0.3784,0.1932,0.3063,0.08368,
84458202,M,13.71,20.83,90.2,577.9,0.1189,0.1645,0.09366,0.05985,0.2196,...,28.14,110.6,897.0,0.1654,0.3682,0.2678,0.1556,0.3196,0.1151,
844981,M,13.0,21.82,87.5,519.8,0.1273,0.1932,0.1859,0.09353,0.235,...,30.73,106.2,739.3,0.1703,0.5401,0.539,0.206,0.4378,0.1072,
84501001,M,12.46,24.04,83.97,475.9,0.1186,0.2396,0.2273,0.08543,0.203,...,40.68,97.65,711.4,0.1853,1.058,1.105,0.221,0.4366,0.2075,


In [44]:
dd.tail(7)

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
925622,M,15.22,30.62,103.4,716.9,0.1048,0.2087,0.255,0.09429,0.2128,...,42.79,128.7,915.0,0.1417,0.7917,1.17,0.2356,0.4089,0.1409,
926125,M,20.92,25.09,143.0,1347.0,0.1099,0.2236,0.3174,0.1474,0.2149,...,29.41,179.1,1819.0,0.1407,0.4186,0.6599,0.2542,0.2929,0.09873,
926424,M,21.56,22.39,142.0,1479.0,0.111,0.1159,0.2439,0.1389,0.1726,...,26.4,166.1,2027.0,0.141,0.2113,0.4107,0.2216,0.206,0.07115,
926682,M,20.13,28.25,131.2,1261.0,0.0978,0.1034,0.144,0.09791,0.1752,...,38.25,155.0,1731.0,0.1166,0.1922,0.3215,0.1628,0.2572,0.06637,
926954,M,16.6,28.08,108.3,858.1,0.08455,0.1023,0.09251,0.05302,0.159,...,34.12,126.7,1124.0,0.1139,0.3094,0.3403,0.1418,0.2218,0.0782,
927241,M,20.6,29.33,140.1,1265.0,0.1178,0.277,0.3514,0.152,0.2397,...,39.42,184.6,1821.0,0.165,0.8681,0.9387,0.265,0.4087,0.124,
92751,B,7.76,24.54,47.92,181.0,0.05263,0.04362,0.0,0.0,0.1587,...,30.37,59.16,268.6,0.08996,0.06444,0.0,0.0,0.2871,0.07039,


In [45]:
dd.describe()

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,0.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,0.062798,...,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946,
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,0.00706,...,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061,
min,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.04996,...,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504,
25%,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,0.0577,...,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146,
50%,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,0.06154,...,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004,
75%,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,0.06612,...,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208,
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,0.09744,...,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075,


In [47]:
dd.shape

(569, 32)

In [48]:
len(dd)

569

### Column-wise indexing

In [50]:
dd[['diagnosis', 'radius_mean']].head()

Unnamed: 0_level_0,diagnosis,radius_mean
id,Unnamed: 1_level_1,Unnamed: 2_level_1
842302,M,17.99
842517,M,20.57
84300903,M,19.69
84348301,M,11.42
84358402,M,20.29


In [53]:
dd.columns[:5]

Index(['diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',
       'area_mean'],
      dtype='object')

In [57]:
dd[dd.columns[:5]].head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
842302,M,17.99,10.38,122.8,1001.0
842517,M,20.57,17.77,132.9,1326.0
84300903,M,19.69,21.25,130.0,1203.0
84348301,M,11.42,20.38,77.58,386.1
84358402,M,20.29,14.34,135.1,1297.0


### Row-wise indexing

`.loc` and `.iloc`

In [64]:
dd.loc[842302: 84358402]

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [65]:
dd.iloc[0: 5]

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


### Selection specific rows and columns

In [66]:
dd.loc[842302, 'diagnosis']

'M'

In [69]:
dd.iloc[0]['diagnosis']

'M'

In [70]:
dd.head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


### Data Inspection

In [241]:
dd.dtypes

diagnosis                   object
radius_mean                float64
texture_mean               float64
perimeter_mean             float64
area_mean                  float64
smoothness_mean            float64
compactness_mean           float64
concavity_mean             float64
concave points_mean        float64
symmetry_mean              float64
fractal_dimension_mean     float64
radius_se                  float64
texture_se                 float64
perimeter_se               float64
area_se                    float64
smoothness_se              float64
compactness_se             float64
concavity_se               float64
concave points_se          float64
symmetry_se                float64
fractal_dimension_se       float64
radius_worst               float64
texture_worst              float64
perimeter_worst            float64
area_worst                 float64
smoothness_worst           float64
compactness_worst          float64
concavity_worst            float64
concave points_worst

In [242]:
dd.shape

(569, 32)

### Data Manipulation

#### Manipulating values

Its important to note that the `Series`/`DataFrame` that we get when we index our original DataFrame still includes the values of the original DataFrame, so any changes would also take effect on the original DataFrame (So you must be cautious when manipulating this data). The way to do this safely is to copy the DataFrame using the `.copy()` method.

In [244]:
dd.area_mean = 0
dd.head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,M,17.99,10.38,122.8,0,0.1184,0.2776,0.3001,0.1471,0.2419,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,M,20.57,17.77,132.9,0,0.08474,0.07864,0.0869,0.07017,0.1812,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,M,19.69,21.25,130.0,0,0.1096,0.1599,0.1974,0.1279,0.2069,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,M,11.42,20.38,77.58,0,0.1425,0.2839,0.2414,0.1052,0.2597,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,M,20.29,14.34,135.1,0,0.1003,0.1328,0.198,0.1043,0.1809,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


#### Manipulating Indices

We can also change the index if were are not happy with the current one:

In [224]:
dd.index = dd.radius_mean

**Reindexing** allows users to manipulate the data labels in a DataFrame. It forces a DataFrame to conform to the new index, and optionally, fill in missing data if requested.

A simple use of `reindex` is to alter the order of the rows:

In [251]:
dd.reindex(dd.index[::-1].copy()).head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
92751,B,7.76,24.54,47.92,181.0,0.05263,0.04362,0.0,0.0,0.1587,...,30.37,59.16,268.6,0.08996,0.06444,0.0,0.0,0.2871,0.07039,
927241,M,20.6,29.33,140.1,1265.0,0.1178,0.277,0.3514,0.152,0.2397,...,39.42,184.6,1821.0,0.165,0.8681,0.9387,0.265,0.4087,0.124,
926954,M,16.6,28.08,108.3,858.1,0.08455,0.1023,0.09251,0.05302,0.159,...,34.12,126.7,1124.0,0.1139,0.3094,0.3403,0.1418,0.2218,0.0782,
926682,M,20.13,28.25,131.2,1261.0,0.0978,0.1034,0.144,0.09791,0.1752,...,38.25,155.0,1731.0,0.1166,0.1922,0.3215,0.1628,0.2572,0.06637,
926424,M,21.56,22.39,142.0,1479.0,0.111,0.1159,0.2439,0.1389,0.1726,...,26.4,166.1,2027.0,0.141,0.2113,0.4107,0.2216,0.206,0.07115,


We can also remove (`drop`) rows

In [252]:
dd.head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [260]:
dd.drop([842302, 84348301]).head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,
843786,M,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,0.2087,...,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244,
844359,M,18.25,19.98,119.6,1040.0,0.09463,0.109,0.1127,0.074,0.1794,...,27.66,153.2,1606.0,0.1442,0.2576,0.3784,0.1932,0.3063,0.08368,


#### Manipulating Columns

One can easily create a new column:

In [408]:
dd['year'] = 2019

**note**, we cannot use the attribute indexing method to add a new column:

In [227]:
dd.month = 'March'

In [228]:
dd.head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32,year
radius_mean,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
17.99,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,,2019
20.57,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,,2019
19.69,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,,2019
11.42,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,,2019
20.29,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,,2019


Re-ordering the column is basically calling the columns in the order that we prefer. Here is an exmaple of reversing the columns:

In [88]:
dd[dd.columns[::-1]].head()

Unnamed: 0_level_0,Unnamed: 32,fractal_dimension_worst,symmetry_worst,concave points_worst,concavity_worst,compactness_worst,smoothness_worst,area_worst,perimeter_worst,texture_worst,...,symmetry_mean,concave points_mean,concavity_mean,compactness_mean,smoothness_mean,area_mean,perimeter_mean,texture_mean,radius_mean,diagnosis
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,,0.1189,0.4601,0.2654,0.7119,0.6656,0.1622,2019.0,184.6,17.33,...,0.2419,0.1471,0.3001,0.2776,0.1184,1001.0,122.8,10.38,17.99,M
842517,,0.08902,0.275,0.186,0.2416,0.1866,0.1238,1956.0,158.8,23.41,...,0.1812,0.07017,0.0869,0.07864,0.08474,1326.0,132.9,17.77,20.57,M
84300903,,0.08758,0.3613,0.243,0.4504,0.4245,0.1444,1709.0,152.5,25.53,...,0.2069,0.1279,0.1974,0.1599,0.1096,1203.0,130.0,21.25,19.69,M
84348301,,0.173,0.6638,0.2575,0.6869,0.8663,0.2098,567.7,98.87,26.5,...,0.2597,0.1052,0.2414,0.2839,0.1425,386.1,77.58,20.38,11.42,M
84358402,,0.07678,0.2364,0.1625,0.4,0.205,0.1374,1575.0,152.2,16.67,...,0.1809,0.1043,0.198,0.1328,0.1003,1297.0,135.1,14.34,20.29,M


How to rename a column?

In [267]:
dd.head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [269]:
dd.rename(columns={'diagnosis':'A',
                   'radius_mean':'B',
                   'perimeter_mean':'C'})

In [270]:
dd.head()

Unnamed: 0_level_0,A,B,texture_mean,C,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


We can remove columns the same way as we did rows:

In [263]:
dd.drop(['diagnosis', 'radius_mean'], axis=1).head()

Unnamed: 0_level_0,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,radius_se,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,0.5435,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,0.7456,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,0.4956,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,0.7572,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


---

### Data Selection

We have already talked about indexing through row and columsn in pands, but we can also use other methods (depending on values) to select our data:

In [275]:
dd.head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [279]:
dd[dd.diagnosis != 'M'].head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
8510426,B,13.54,14.36,87.46,566.3,0.09779,0.08129,0.06664,0.04781,0.1885,...,19.26,99.7,711.2,0.144,0.1773,0.239,0.1288,0.2977,0.07259,
8510653,B,13.08,15.71,85.63,520.0,0.1075,0.127,0.04568,0.0311,0.1967,...,20.49,96.09,630.5,0.1312,0.2776,0.189,0.07283,0.3184,0.08183,
8510824,B,9.504,12.44,60.34,273.9,0.1024,0.06492,0.02956,0.02076,0.1815,...,15.66,65.13,314.9,0.1324,0.1148,0.08867,0.06227,0.245,0.07773,
854941,B,13.03,18.42,82.61,523.8,0.08983,0.03766,0.02562,0.02923,0.1467,...,22.81,84.46,545.9,0.09701,0.04619,0.04833,0.05013,0.1987,0.06169,
85713702,B,8.196,16.84,51.71,201.9,0.086,0.05943,0.01588,0.005917,0.1769,...,21.96,57.26,242.2,0.1297,0.1357,0.0688,0.02564,0.3105,0.07409,


For a more concise (and readable) syntax, we can use the `query` method to perform selection on a `DataFrame`. Instead of having to type the fully-specified column, we can simply pass a string that describes what to select. The query above is then simply:

In [110]:
cond = 'diagnosis != "M"'
dd.query(cond).head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
8510426,B,13.54,14.36,87.46,566.3,0.09779,0.08129,0.06664,0.04781,0.1885,...,19.26,99.7,711.2,0.144,0.1773,0.239,0.1288,0.2977,0.07259,
8510653,B,13.08,15.71,85.63,520.0,0.1075,0.127,0.04568,0.0311,0.1967,...,20.49,96.09,630.5,0.1312,0.2776,0.189,0.07283,0.3184,0.08183,
8510824,B,9.504,12.44,60.34,273.9,0.1024,0.06492,0.02956,0.02076,0.1815,...,15.66,65.13,314.9,0.1324,0.1148,0.08867,0.06227,0.245,0.07773,
854941,B,13.03,18.42,82.61,523.8,0.08983,0.03766,0.02562,0.02923,0.1467,...,22.81,84.46,545.9,0.09701,0.04619,0.04833,0.05013,0.1987,0.06169,
85713702,B,8.196,16.84,51.71,201.9,0.086,0.05943,0.01588,0.005917,0.1769,...,21.96,57.26,242.2,0.1297,0.1357,0.0688,0.02564,0.3105,0.07409,


In [111]:
charac = 'M'
cond = 'diagnosis != @charac'
dd.query(cond).head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
8510426,B,13.54,14.36,87.46,566.3,0.09779,0.08129,0.06664,0.04781,0.1885,...,19.26,99.7,711.2,0.144,0.1773,0.239,0.1288,0.2977,0.07259,
8510653,B,13.08,15.71,85.63,520.0,0.1075,0.127,0.04568,0.0311,0.1967,...,20.49,96.09,630.5,0.1312,0.2776,0.189,0.07283,0.3184,0.08183,
8510824,B,9.504,12.44,60.34,273.9,0.1024,0.06492,0.02956,0.02076,0.1815,...,15.66,65.13,314.9,0.1324,0.1148,0.08867,0.06227,0.245,0.07773,
854941,B,13.03,18.42,82.61,523.8,0.08983,0.03766,0.02562,0.02923,0.1467,...,22.81,84.46,545.9,0.09701,0.04619,0.04833,0.05013,0.1987,0.06169,
85713702,B,8.196,16.84,51.71,201.9,0.086,0.05943,0.01588,0.005917,0.1769,...,21.96,57.26,242.2,0.1297,0.1357,0.0688,0.02564,0.3105,0.07409,


In [114]:
dd.loc[dd.diagnosis=='B'].head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
8510426,B,13.54,14.36,87.46,566.3,0.09779,0.08129,0.06664,0.04781,0.1885,...,19.26,99.7,711.2,0.144,0.1773,0.239,0.1288,0.2977,0.07259,
8510653,B,13.08,15.71,85.63,520.0,0.1075,0.127,0.04568,0.0311,0.1967,...,20.49,96.09,630.5,0.1312,0.2776,0.189,0.07283,0.3184,0.08183,
8510824,B,9.504,12.44,60.34,273.9,0.1024,0.06492,0.02956,0.02076,0.1815,...,15.66,65.13,314.9,0.1324,0.1148,0.08867,0.06227,0.245,0.07773,
854941,B,13.03,18.42,82.61,523.8,0.08983,0.03766,0.02562,0.02923,0.1467,...,22.81,84.46,545.9,0.09701,0.04619,0.04833,0.05013,0.1987,0.06169,
85713702,B,8.196,16.84,51.71,201.9,0.086,0.05943,0.01588,0.005917,0.1769,...,21.96,57.26,242.2,0.1297,0.1357,0.0688,0.02564,0.3105,0.07409,


And we can also select the columns we want to look into right away:

In [115]:
dd.loc[dd.diagnosis == 'B', ['diagnosis', 'radius_mean', 'texture_mean']].head()

Unnamed: 0_level_0,diagnosis,radius_mean,texture_mean
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
8510426,B,13.54,14.36
8510653,B,13.08,15.71
8510824,B,9.504,12.44
854941,B,13.03,18.42
85713702,B,8.196,16.84


---

### Data transformation

In [118]:
dd.sum()[0]

'MMMMMMMMMMMMMMMMMMMBBBMMMMMMMMMMMMMMMBMMMMMMMMBMBBBBBMMBMMBBBBMBMMBBBBMBMMBMBMMBBBMMBMMMBBBMBBMMBBBMMBBBBMBBMBBBBBBBBMMMBMMBBBMMBMBMMBMMBBMBBMBBBBMBBBBBBBBBMBBBBMMBMBBMMBBMMBBBBMBBMMMBMBMBBBMBBMMBMMMMBMMMBMBMBBMBMMMMBBMMBBBMBBBBBMMBBMBBMMBMBBBBMBBBBBMBMMMMMMMMMMMMMMBBBBBBMBMBBMBBMBMMBBBBBBBBBBBBBMBBMBMBBBBBBBBBBBBBBMBBBMBMBBBBMMMBBBBMBMBMBBBMBBBBBBBMMMBBBBBBBBBBBMMBMMMBMMBBBBBMBBBBBMBBBMBBMMBBBBBBMBBBBBBBMBBBBBMBBMBBBBBBBBBBBBMBMMBMBBBBBMBBMBMBBMBMBBBBBBBBMMBBBBBBMBBBBBBBBBBMBBBBBBBMBMBBMBBBBBMMBMBMBBBBBMBBMBMBMMBBBMBBBBBBBBBBBMBMMBBBBBBBBBBBBBBBBBBBBBBBBBMMMMMMB'

In [317]:
# dd.sum(axis=1)

<center>
    <img src="pandas_function.png" />
</center>

In [119]:
dd.agg([np.sum, np.mean, np.std])

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
sum,MMMMMMMMMMMMMMMMMMMBBBMMMMMMMMMMMMMMMBMMMMMMMM...,8038.429,10975.81,52330.38,372631.9,54.829,59.37002,50.526811,27.834994,103.0811,...,14610.34,61031.63,501051.8,75.31773,144.67681,154.875247,65.210941,165.053,47.76517,0.0
mean,,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,...,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946,
std,,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,...,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061,


In [120]:
dd.agg([np.sum, np.mean, np.std])['radius_mean']

sum     8038.429000
mean      14.127292
std        3.524049
Name: radius_mean, dtype: float64

In [121]:
dd['radius_mean'].agg([np.sum, np.mean, np.std])

sum     8038.429000
mean      14.127292
std        3.524049
Name: radius_mean, dtype: float64

**Note** that for this method, we are take a dataframe, and returning  single value for each column (by applying a function that takes all the value in the column)

What if we want to apply a function element-wise?

In [136]:
class A:
    def __init__(self, a, b):
        self.a = a
        self.b = b
        
    def __repr__(self):
        return "<A(a={self.a}, b={self.b})>".format(self=self)
    
    def __call__(self, g):
        return g + 3
    
    def _repr_html_(self):
        return "<h1 style='color: red'>Hi</h1>"
    
        
a = A(3, 4)
a

#### apply method

In [137]:
dd[['radius_mean', 'texture_mean']].apply(lambda x: x - x.min()).head()

Unnamed: 0_level_0,radius_mean,texture_mean
id,Unnamed: 1_level_1,Unnamed: 2_level_1
842302,11.009,0.67
842517,13.589,8.06
84300903,12.709,11.54
84348301,4.439,10.67
84358402,13.309,4.63


Let's verify this

In [138]:
dd.iloc[0]['radius_mean'] - dd['radius_mean'].min()

11.008999999999999

We can do the same thing, but through colums:

In [139]:
dd[['radius_mean', 'texture_mean']].apply(lambda x: x - x.min(), axis=1).head()

Unnamed: 0_level_0,radius_mean,texture_mean
id,Unnamed: 1_level_1,Unnamed: 2_level_1
842302,7.61,0.0
842517,2.8,0.0
84300903,0.0,1.56
84348301,0.0,8.96
84358402,5.95,0.0


We can also define out function separately

In [140]:
def z_score(df):
    return (df - df.mean()) / df.std()

In [141]:
dd[['radius_mean', 'texture_mean']].apply(z_score).head()

Unnamed: 0_level_0,radius_mean,texture_mean
id,Unnamed: 1_level_1,Unnamed: 2_level_1
842302,1.0961,-2.071512
842517,1.828212,-0.353322
84300903,1.578499,0.455786
84348301,-0.768233,0.253509
84358402,1.748758,-1.150804


---

### Merge and Split

In [144]:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 1, 2, 3])

df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']},
                   index=[4, 5, 6, 7])

#### concat

In [147]:
pd.concat((df1, df2), axis=0)

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


In [150]:
pd.concat((df1, df2), axis=1)

Unnamed: 0,A,B,C,D,A.1,B.1,C.1,D.1
0,A0,B0,C0,D0,,,,
1,A1,B1,C1,D1,,,,
2,A2,B2,C2,D2,,,,
3,A3,B3,C3,D3,,,,
4,,,,,A4,B4,C4,D4
5,,,,,A5,B5,C5,D5
6,,,,,A6,B6,C6,D6
7,,,,,A7,B7,C7,D7


#### append

In [151]:
df1.append(df2)

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


#### merge

pandas provides a single function, `merge()`, as the entry point for all standard database join operations between 

In [152]:
pd.merge(df1, df2)

Unnamed: 0,A,B,C,D


In [156]:
pd.merge(df1, df2)

Unnamed: 0,A,B,C,D


In [157]:
pd.merge(df1, df2, how="outer", on='C')

Unnamed: 0,A_x,B_x,C,D_x,A_y,B_y,D_y
0,A0,B0,C0,D0,,,
1,A1,B1,C1,D1,,,
2,A2,B2,C2,D2,,,
3,A3,B3,C3,D3,,,
4,,,C4,,A4,B4,D4
5,,,C5,,A5,B5,D5
6,,,C6,,A6,B6,D6
7,,,C7,,A7,B7,D7


---

### Split-Apply-Combine

In [161]:
grouped = dd.groupby('diagnosis')

for name, df in grouped:
    print(name)

B
M


In [162]:
grouped.groups.keys()

dict_keys(['B', 'M'])

And now we we probably want to apply a function (data transformation) to each group. Depending whether we want a column-wise operation or an element-wise operation, we can use `agg()` or `apply()`, respectively.

In [163]:
grouped.agg(np.min)

Unnamed: 0_level_0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
diagnosis,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
B,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.05185,...,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1566,0.05521,
M,10.95,10.38,71.9,361.6,0.07371,0.04605,0.02398,0.02031,0.1308,0.04996,...,16.67,85.1,508.1,0.08822,0.05131,0.02398,0.02899,0.1565,0.05504,


In [164]:
grouped.apply(lambda x: x-x.min()).head()

Unnamed: 0_level_0,Unnamed: 32,area_mean,area_se,area_worst,compactness_mean,compactness_se,compactness_worst,concave points_mean,concave points_se,concave points_worst,...,radius_worst,smoothness_mean,smoothness_se,smoothness_worst,symmetry_mean,symmetry_se,symmetry_worst,texture_mean,texture_se,texture_worst
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,,639.4,139.41,1510.9,0.23155,0.040618,0.61429,0.12679,0.010696,0.23641,...,12.54,0.04469,0.003732,0.07398,0.1111,0.022148,0.3036,0.0,0.5432,0.66
842517,,964.4,60.09,1447.9,0.03259,0.004658,0.13529,0.04986,0.008226,0.15701,...,12.15,0.01103,0.002558,0.03558,0.0504,0.006008,0.1185,7.39,0.3718,6.74
84300903,,841.4,80.04,1200.9,0.11385,0.031638,0.37319,0.10759,0.015406,0.21401,...,10.73,0.03589,0.003483,0.05618,0.0761,0.014618,0.2048,10.87,0.4248,8.86
84348301,,24.5,13.24,59.6,0.23785,0.066158,0.81499,0.08489,0.013496,0.22851,...,2.07,0.06879,0.006443,0.12158,0.1289,0.051748,0.5073,10.0,0.7939,9.83
84358402,,935.4,80.45,1066.9,0.08675,0.016188,0.15369,0.08399,0.013676,0.13351,...,9.7,0.02659,0.008823,0.04918,0.0501,0.009678,0.0799,3.96,0.4192,0.0


Note that this return a single dataframe. And here is the whole process of grouping, applying and joinining in a single line:

In [165]:
new_dd = dd.groupby('diagnosis').apply(lambda x: x - x.min()).copy()

In [166]:
new_dd.head()

Unnamed: 0_level_0,Unnamed: 32,area_mean,area_se,area_worst,compactness_mean,compactness_se,compactness_worst,concave points_mean,concave points_se,concave points_worst,...,radius_worst,smoothness_mean,smoothness_se,smoothness_worst,symmetry_mean,symmetry_se,symmetry_worst,texture_mean,texture_se,texture_worst
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
842302,,639.4,139.41,1510.9,0.23155,0.040618,0.61429,0.12679,0.010696,0.23641,...,12.54,0.04469,0.003732,0.07398,0.1111,0.022148,0.3036,0.0,0.5432,0.66
842517,,964.4,60.09,1447.9,0.03259,0.004658,0.13529,0.04986,0.008226,0.15701,...,12.15,0.01103,0.002558,0.03558,0.0504,0.006008,0.1185,7.39,0.3718,6.74
84300903,,841.4,80.04,1200.9,0.11385,0.031638,0.37319,0.10759,0.015406,0.21401,...,10.73,0.03589,0.003483,0.05618,0.0761,0.014618,0.2048,10.87,0.4248,8.86
84348301,,24.5,13.24,59.6,0.23785,0.066158,0.81499,0.08489,0.013496,0.22851,...,2.07,0.06879,0.006443,0.12158,0.1289,0.051748,0.5073,10.0,0.7939,9.83
84358402,,935.4,80.45,1066.9,0.08675,0.016188,0.15369,0.08399,0.013676,0.13351,...,9.7,0.02659,0.008823,0.04918,0.0501,0.009678,0.0799,3.96,0.4192,0.0


---