# Last time we saw how we could read information from different files (.txt and .jpg)
# Today we will start with an easier way of handling paths: namely, pathlib!

In [10]:
# The way I have shown this to you before is with strings specifying the paths:

path_to_file_str = "files/COVID-00003b.jpg"
path_to_file_str.split("/")[1]  # Returns COVID-00003b.jpg

# But the longer path we have, the more complicated it becomes to reference different parts of the path!
# reading path.split("/")[5] from a path = "C/home/docs/super-secret/text-files/folder-1/subfolder-5/text-file-1.txt"
# is.. hard

'COVID-00003b.jpg'

In [11]:
# So now, we are introducing the pathlib!
# Pathlib makes it easier for us to handle long paths in an efficient way,
# and it is neat to read

from pathlib import Path

path_to_file = Path("files/COVID-00003b.jpg")
print(path_to_file.parent)
print(path_to_file.name)


files
COVID-00003b.jpg


In [12]:
# with pathlib it is easy to create new paths from old paths:
# maybe i would like to insert another folder to put the file

new_path_to_file = path_to_file.parent / "unchanged_files" / path_to_file.name
print(new_path_to_file)

# if I would do the same using strings, it looks like this:
new_path_to_file = path_to_file_str.split("/")[0] + "\\unchanged_files\\" + path_to_file_str.split("/")[1]
print(new_path_to_file)
# which is a lot less intuitive for someone coming into the code at a later stage!

files/unchanged_files/COVID-00003b.jpg
files\unchanged_files\COVID-00003b.jpg


## The final thing with pathlib is that it is OS independent - it converts paths between OSX and Windows

### Working with relative paths in a project that should be able to be run both on OSX and windows can create problems when using strings, as slashes are used differently. Pathlib fixes that for us!

Read more here: https://docs.python.org/3/library/pathlib.html

# Now we move on to the library called shutil, which is nice to have when we want to move files around and don't care specifically about the file contents
# i.e. we want to move file A from place B to place C, but we do not need to read the contents of file A

In [13]:
import shutil

path_to_file = Path("files/COVID-00003b.jpg")
new_path_to_file = path_to_file.parent / "unchanged_files" / path_to_file.name


In [14]:
path_to_file

PosixPath('files/COVID-00003b.jpg')

In [15]:
# Now we have defined the path to the file and what we wish to be the new path
# We can choose if we would like to move the file or copy it!

shutil.copy(path_to_file, new_path_to_file)

FileNotFoundError: [Errno 2] No such file or directory: 'files/unchanged_files/COVID-00003b.jpg'

In [16]:
# Hmm that did not work because the folder unchanged_files does not exist yet
# so we should change our code to something like this:

try:
    shutil.copy(path_to_file, new_path_to_file)
except FileNotFoundError:
    if not new_path_to_file.exists(): #<----- if the new path does not exist
        new_path_to_file.mkdir(parents=True) #<-- create the new path!
        shutil.copy(path_to_file, new_path_to_file) # <--- and then copy the file

# Now, we will look at one more nice thing, that can improve file management for us:
# f-strings

## F-strings are present from python 3 and onwards, meaning that if you know
## that you are working with a project that should be backwards compatible, maybe this is not the best
## otherwise: go ahead!

In [17]:
# when we are working with strings where we want to modify only parts of it, 
# a neat way of making that readable is by using f-strings!

# let's start with a simple example where we want to create lots of files and
# name them using an index at the end.

for i in range(10):
    file_name = f"covid_0{i}.txt"
    print(file_name)

covid_00.txt
covid_01.txt
covid_02.txt
covid_03.txt
covid_04.txt
covid_05.txt
covid_06.txt
covid_07.txt
covid_08.txt
covid_09.txt


In [18]:
sample_dataset_path = Path("../exercises/sample_dataset") #<-- this is a RELATIVE path
i = 0
for old_path in sample_dataset_path.glob("*/*"): #<-- the stars means wildcard: any file
    print(old_path)
    new_path = old_path.parents[2] / "cleaned_dataset" / "train" / old_path.parent.name / (f"{old_path.parent.name}_{i}" + old_path.suffix)
    print(new_path)
    i += 1
    try:
        #shutil.copy(old_path, new_path)
        pass
    except FileNotFoundError:
        if not new_path.exists():
            new_path.mkdir(parents=True)
            #shutil.copy(old_path, new_path)

# Dataframes with Pandas

We will explore pandas and it's basic functionality. 
Three central concepts to pandas are:
- series
- dataframes
- index

In [20]:
import pandas as pd
import numpy as np

In [22]:
import seaborn as sns
tips = sns.load_dataset("tips")
tips.head(5)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


## pd.Series
Is a one-dimensional data array

TODO:
1. Create a series
2. Look at the attributes: values, index
3. Access values, single and slices
4. Change the index
5. Create series from dictionary, list, numpy array, tuple

In [24]:
# 1. 
apples = pd.Series([55, 22, 33, 11, 36], index=['ica', 'willys', 'coop', 'hemköp', 'livs'])
apples

ica       55
willys    22
coop      33
hemköp    11
livs      36
dtype: int64

In [25]:
apples['willys']

22

In [27]:
apples.index

Index(['ica', 'willys', 'coop', 'hemköp', 'livs'], dtype='object')

In [28]:
d1 = {11:1, 12:2, 13:3}
pd.Series(d1)

11    1
12    2
13    3
dtype: int64

### pd.DataFrame

A twodimensional data matrix. A dataframe as both rows (defined by the index) and columns. Each column is a pandas series.
To create a dataframe we need to supply data, names of columns, 

1. Create a simple empty dataframe
2. Create a dataframe from series


In [29]:
df = pd.DataFrame(data=[['katt', 6, 0.3], ['hund', 20, 0.6], ['fågel', 0.2, 0.1]], 
                        index=[1,2,3], columns=['animal', 'weight','height'])

In [30]:
type(df)

pandas.core.frame.DataFrame

In [31]:
animal = pd.Series(['katt', 'hund', 'fågel'])
weight = pd.Series([6, 20, 0.2])

df = pd.DataFrame({'animal':animal, 'weight':weight}, index=[1,2,3])
df

Unnamed: 0,animal,weight
1,hund,20.0
2,fågel,0.2
3,,


## pd.Index

The index that represents rows in a dataframe or elements in a series. AN index is immutable - can't be changed - which makes it safer to share between series or dataframes.

TODO:
1. Set operations as intersection, union etc



In [32]:
ind1 = pd.Index([1, 5, 6])
ind2 = pd.Index([1, 5, '6', 7])

ind1.symmetric_difference(ind2)
ind1.intersection(ind2)

Index([1, 5], dtype='object')

## Load data in different formats
- excel, csv, json


In [33]:
pokemon_df = pd.read_csv('files/Pokemon.csv')
iris_df = pd.read_json('files/iris.json')

## Show data

- Use head and tail
- column types: dtypes
- statistics: mean, sum, quantile -> describe


In [34]:
iris_df.head()

Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [35]:
pokemon_df.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [36]:
pokemon_df.describe()

Unnamed: 0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,435.1025,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,208.343798,119.96304,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,1.0,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,184.75,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,364.5,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,539.25,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,721.0,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


In [37]:
iris_df.describe()

Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [38]:
pokemon_df = pokemon_df.set_index('Name')
pokemon_df.head()

Unnamed: 0_level_0,#,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Bulbasaur,1,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,2,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Grass,Poison,525,80,82,83,100,100,80,1,False
VenusaurMega Venusaur,3,Grass,Poison,625,80,100,123,122,120,80,1,False
Charmander,4,Fire,,309,39,52,43,60,50,65,1,False


In [39]:
pokemon_df.index

Index(['Bulbasaur', 'Ivysaur', 'Venusaur', 'VenusaurMega Venusaur',
       'Charmander', 'Charmeleon', 'Charizard', 'CharizardMega Charizard X',
       'CharizardMega Charizard Y', 'Squirtle',
       ...
       'Noibat', 'Noivern', 'Xerneas', 'Yveltal', 'Zygarde50% Forme',
       'Diancie', 'DiancieMega Diancie', 'HoopaHoopa Confined',
       'HoopaHoopa Unbound', 'Volcanion'],
      dtype='object', name='Name', length=800)

## Indexing and subset
- iloc - integer location 
- loc - location
- by column
- by column value


df['col0'] returns a **column**, not a row

In [45]:
iris_df.loc[2]

sepalLength       4.7
sepalWidth        3.2
petalLength       1.3
petalWidth        0.2
species        setosa
Name: 2, dtype: object

In [47]:
iris_df.iloc[5:10]

Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth,species
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa
7,5.0,3.4,1.5,0.2,setosa
8,4.4,2.9,1.4,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa


In [None]:
pokemon_df.loc['Bulbasaur':'Venusaur']

Unnamed: 0_level_0,#,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Bulbasaur,1,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,2,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Grass,Poison,525,80,82,83,100,100,80,1,False


In [49]:
pokemon_df[pokemon_df['Type 1'] == 'Water']

Unnamed: 0_level_0,#,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Squirtle,7,Water,,314,44,48,65,50,64,43,1,False
Wartortle,8,Water,,405,59,63,80,65,80,58,1,False
Blastoise,9,Water,,530,79,83,100,85,105,78,1,False
BlastoiseMega Blastoise,9,Water,,630,79,103,120,135,115,78,1,False
Psyduck,54,Water,,320,50,52,48,65,50,55,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...
Froakie,656,Water,,314,41,56,40,62,44,71,6,False
Frogadier,657,Water,,405,54,63,52,83,56,97,6,False
Greninja,658,Water,Dark,530,72,95,67,103,71,122,6,False
Clauncher,692,Water,,330,50,53,62,58,63,44,6,False


### Groupy by
df.groupby is a very useful function when we want to perform some calculation on different groups of data

In [48]:
df.groupby?

[0;31mSignature:[0m
[0mdf[0m[0;34m.[0m[0mgroupby[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mby[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0maxis[0m[0;34m:[0m [0;34m'Axis'[0m [0;34m=[0m [0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlevel[0m[0;34m:[0m [0;34m'Level | None'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mas_index[0m[0;34m:[0m [0;34m'bool'[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msort[0m[0;34m:[0m [0;34m'bool'[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mgroup_keys[0m[0;34m:[0m [0;34m'bool'[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msqueeze[0m[0;34m:[0m [0;34m'bool | lib.NoDefault'[0m [0;34m=[0m [0;34m<[0m[0mno_default[0m[0;34m>[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mobserved[0m[0;34m:[0m [0;34m'bool'[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;

In [51]:
pokemon_df.groupby(by=['Type 1', 'Type 2'])[['Speed', 'Attack']].mean().head(25)

Unnamed: 0_level_0,Unnamed: 1_level_0,Speed,Attack
Type 1,Type 2,Unnamed: 2_level_1,Unnamed: 3_level_1
Bug,Electric,86.5,62.0
Bug,Fighting,80.0,155.0
Bug,Fire,80.0,72.5
Bug,Flying,82.857143,70.142857
Bug,Ghost,40.0,90.0
Bug,Grass,44.5,73.833333
Bug,Ground,38.0,62.0
Bug,Poison,65.916667,68.333333
Bug,Rock,35.0,56.666667
Bug,Steel,63.428571,114.714286


### Iteration
Not good practise but may be necessary sometimes
- iteration av series och dataframes
- df.items()
- df.iterrows()
- never modify what you're iterating over!!!


In [53]:
for col in iris_df.items():
    print(col)
    break


('sepalLength', 0      5.1
1      4.9
2      4.7
3      4.6
4      5.0
      ... 
145    6.7
146    6.3
147    6.5
148    6.2
149    5.9
Name: sepalLength, Length: 150, dtype: float64)


### Sorting

Pandas can sort in three different ways, based on:
- column values
- row values
- a combination of both

In [52]:
pokemon_df.sort_values('HP', ascending=False)

Unnamed: 0_level_0,#,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Blissey,242,Normal,,540,255,10,10,75,135,55,2,False
Chansey,113,Normal,,450,250,5,5,35,105,50,1,False
Wobbuffet,202,Psychic,,405,190,33,58,33,58,33,2,False
Wailord,321,Water,,500,170,90,45,90,45,60,3,False
Alomomola,594,Water,,470,165,75,80,40,45,65,5,False
...,...,...,...,...,...,...,...,...,...,...,...,...
Magikarp,129,Water,,200,20,10,55,15,20,80,1,False
Feebas,349,Water,,200,20,15,20,10,55,80,3,False
Duskull,355,Ghost,,295,20,40,90,30,90,25,3,False
Diglett,50,Ground,,265,10,55,25,35,45,95,1,False


## Cleaning out missing or corrupted data

- We will consider missing/corrupt data as NaN values
- We have two apporaches:
    1. Replace them with some other value
    2. Throw them away

In [54]:
mask = np.zeros(pokemon_df.shape, dtype=bool)
mask[30:35, 3:5] = True
pokemon_df[mask] = np.nan
pokemon_df.iloc[25:36]

Unnamed: 0_level_0,#,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Raticate,20,Normal,,413.0,55.0,81,60,50,70,97,1,False
Spearow,21,Normal,Flying,262.0,40.0,60,30,31,31,70,1,False
Fearow,22,Normal,Flying,442.0,65.0,90,65,61,61,100,1,False
Ekans,23,Poison,,288.0,35.0,60,44,40,54,55,1,False
Arbok,24,Poison,,438.0,60.0,85,69,65,79,80,1,False
Pikachu,25,Electric,,,,55,40,50,50,90,1,False
Raichu,26,Electric,,,,90,55,90,80,110,1,False
Sandshrew,27,Ground,,,,75,85,20,30,40,1,False
Sandslash,28,Ground,,,,100,110,45,55,65,1,False
Nidoran♀,29,Poison,,,,47,52,40,40,41,1,False


In [55]:
pokemon_df.loc['Pikachu']

#                   25
Type 1        Electric
Type 2             NaN
Total              NaN
HP                 NaN
Attack              55
Defense             40
Sp. Atk             50
Sp. Def             50
Speed               90
Generation           1
Legendary        False
Name: Pikachu, dtype: object

In [None]:
df = pokemon_df.dropna()
df.iloc[25:36]

Unnamed: 0_level_0,#,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Gloom,44,Grass,Poison,395.0,60.0,65,70,85,75,40,1,False
Vileplume,45,Grass,Poison,490.0,75.0,80,85,110,90,50,1,False
Paras,46,Bug,Grass,285.0,35.0,70,55,45,55,25,1,False
Parasect,47,Bug,Grass,405.0,60.0,95,80,60,80,30,1,False
Venonat,48,Bug,Poison,305.0,60.0,55,50,40,55,45,1,False
Venomoth,49,Bug,Poison,450.0,70.0,65,60,90,75,90,1,False
Poliwrath,62,Water,Fighting,510.0,90.0,95,95,70,90,70,1,False
Bellsprout,69,Grass,Poison,300.0,50.0,75,35,70,30,40,1,False
Weepinbell,70,Grass,Poison,390.0,65.0,90,50,85,45,55,1,False
Victreebel,71,Grass,Poison,490.0,80.0,105,65,100,70,70,1,False


In [None]:
df = pokemon_df.fillna(0)
df.iloc[25:36]

Unnamed: 0_level_0,#,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Raticate,20,Normal,0,413.0,55.0,81,60,50,70,97,1,False
Spearow,21,Normal,Flying,262.0,40.0,60,30,31,31,70,1,False
Fearow,22,Normal,Flying,442.0,65.0,90,65,61,61,100,1,False
Ekans,23,Poison,0,288.0,35.0,60,44,40,54,55,1,False
Arbok,24,Poison,0,438.0,60.0,85,69,65,79,80,1,False
Pikachu,25,Electric,0,0.0,0.0,55,40,50,50,90,1,False
Raichu,26,Electric,0,0.0,0.0,90,55,90,80,110,1,False
Sandshrew,27,Ground,0,0.0,0.0,75,85,20,30,40,1,False
Sandslash,28,Ground,0,0.0,0.0,100,110,45,55,65,1,False
Nidoran♀,29,Poison,0,0.0,0.0,47,52,40,40,41,1,False


## pd.Eval
Is a way to compute an expression on your dataframe automatically and can be very useful. The expression has to be written in a pythonic way.


In [None]:
pd.eval?

In [None]:
df = pd.DataFrame({
    'Algorithm': ['XGBoost', 'DNN', 'Perceptron'],
    'MSE': [63.3234, 51.8182, 78.231],
    'Cost': [12, 38, 8]
})
df

Unnamed: 0,Algorithm,MSE,Cost
0,XGBoost,63.3234,12
1,DNN,51.8182,38
2,Perceptron,78.231,8


In [None]:
df = pd.eval('RMSE = df.MSE ** 0.5', target=df)
df

Unnamed: 0,Algorithm,MSE,Cost,RMSE
0,XGBoost,63.3234,12,7.9576
1,DNN,51.8182,38,7.198486
2,Perceptron,78.231,8,8.844829


## inplace=True 

We explicitly have to tell pandas make changes to our dataframe, otherwise it will just return a new df and keep the original intact


In [None]:
df = df.set_index('Algorithm')


Unnamed: 0_level_0,MSE,Cost
Algorithm,Unnamed: 1_level_1,Unnamed: 2_level_1
XGBoost,63.3234,12
DNN,51.8182,38
Perceptron,78.231,8


## df.query

Like pd.eval we can use df.query to query in a pythonic way


In [None]:
df.query('MSE > 50 and Cost > 10')

Unnamed: 0_level_0,MSE,Cost
Algorithm,Unnamed: 1_level_1,Unnamed: 2_level_1
XGBoost,63.3234,12
DNN,51.8182,38
