### What Can Pandas Do?
Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?

What is average value?

Max value?

Min value?

Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the data.

In [98]:
import pandas

In [99]:
mydataset = {
    'cars': ['BMW','FORD','VOLVO'],
    'passing':[1,2,3]
}

In [100]:
var = pandas.DataFrame(mydataset)

In [101]:
var

Unnamed: 0,cars,passing
0,BMW,1
1,FORD,2
2,VOLVO,3


In [102]:
import pandas as pd

In [103]:
var = pd.DataFrame(mydataset)

In [104]:
var

Unnamed: 0,cars,passing
0,BMW,1
1,FORD,2
2,VOLVO,3


In [105]:
pd.__version__

'1.4.4'

# SERIES

In [106]:
import pandas as pd

In [107]:
a = [1,2,3,4]

In [108]:
pd.Series(a)

0    1
1    2
2    3
3    4
dtype: int64

In [109]:
a[0]

1

In [110]:
a[3]

4

### Create Labels
With the index argument, you can name your own labels.


In [111]:
a = [1,2,3,4,5]

In [112]:
var = pd.Series(a, index=['a','b','c','d','e'])

In [113]:
var

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [114]:
var['a']

1

In [115]:
var['c']

3

### Key/Value Objects as Series
You can also use a key/value object, like a dictionary, when creating a Series.


In [116]:
calories = {"day1": 420, "day2": 380, "day3": 390}

In [117]:
var = pd.Series(calories)

In [118]:
var

# Note: The keys of the dictionary become the labels.

day1    420
day2    380
day3    390
dtype: int64

In [119]:
var['day1']

420

##### To select only some of the items in the dictionary, use the index argument and specify only the items you want to include in the Series.

In [120]:
calories = {"day1": 420, "day2": 380, "day3": 390}

In [121]:
var = pd.Series(calories, index=['day1','day3'])

In [122]:
var

day1    420
day3    390
dtype: int64

# DataFrames

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.

In [123]:
import pandas as pd

In [124]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

In [125]:
df = pd.DataFrame(data)

In [126]:
df

Unnamed: 0,calories,duration
0,420,50
1,380,40
2,390,45


### Locate Row
As you can see from the result above, the DataFrame is like a table with rows and columns.

Pandas use the #loc attribute to return one or more specified row(s)



In [127]:
df.loc[0]

calories    420
duration     50
Name: 0, dtype: int64

In [128]:
df.loc[1]

# Note: This example returns a Pandas Series.

calories    380
duration     40
Name: 1, dtype: int64

In [129]:
df.loc[[0,1]]

# Return row 0 and 1:

Unnamed: 0,calories,duration
0,420,50
1,380,40


In [130]:
# Note: When using [], the result is a Pandas DataFrame.

### Named Indexes
With the index argument, you can name your own indexes.



In [131]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

In [132]:
df = pd.DataFrame(data, index=['day1','day2','day3'])

In [133]:
df

Unnamed: 0,calories,duration
day1,420,50
day2,380,40
day3,390,45


### Locate Named Indexes
Use the named index in the loc attribute to return the specified row(s).

In [134]:
df.loc['day2']

calories    380
duration     40
Name: day2, dtype: int64

# Pandas Read CSV

In [135]:
import pandas as pd

In [136]:
df = pd.read_csv('data.csv')

In [137]:
df.head()

Unnamed: 0,Duration,Pulse,Maxpulse,Calories
0,60,110,130,409.1
1,60,117,145,479.0
2,60,103,135,340.0
3,45,109,175,282.4
4,45,117,148,406.0


In [138]:
# print(df.to_string())

# Tip: use to_string() to print the entire DataFrame.

## max_rows
The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with the 

pd.options.display.max_rows 

statement.

In [139]:
print(pd.options.display.max_rows)

# In my system the number is 60, which means that if the DataFrame contains more than 60 rows, 
# the print(df) statement will return only the headers and the first and last 5 rows.

9999


In [140]:
pd.options.display.max_rows = 9999

In [141]:
df = pd.read_csv('data.csv')

In [142]:
#df

# Pandas Read JSON

In [143]:
import pandas as pd

In [148]:
data = {
  "Duration":{
    "0":60,
    "1":60,
    "2":60,
    "3":45,
    "4":45,
    "5":60
  },
  "Pulse":{
    "0":110,
    "1":117,
    "2":103,
    "3":109,
    "4":117,
    "5":102
  },
  "Maxpulse":{
    "0":130,
    "1":145,
    "2":135,
    "3":175,
    "4":148,
    "5":127
  },
  "Calories":{
    "0":409,
    "1":479,
    "2":340,
    "3":282,
    "4":406,
    "5":300
  }
}

In [150]:
df = pd.DataFrame(data)

In [151]:
df

Unnamed: 0,Duration,Pulse,Maxpulse,Calories
0,60,110,130,409
1,60,117,145,479
2,60,103,135,340
3,45,109,175,282
4,45,117,148,406
5,60,102,127,300


In [152]:
df.info

<bound method DataFrame.info of    Duration  Pulse  Maxpulse  Calories
0        60    110       130       409
1        60    117       145       479
2        60    103       135       340
3        45    109       175       282
4        45    117       148       406
5        60    102       127       300>