# What is it?
It's a powerful data analysis tool. According to Andrei, we can, in a certain way, say Pandas is Python's Excel. Well, let's see what it can do.

# Panda series
A Panda serie is basically a column. It does not have to contain only one type of data, but it's better if you've got only one.

In [3]:
import pandas as pd

my_list = list (range(1, 11, 2))
print(f'The Python list: {my_list}')

s1 = pd.Series(my_list)
print(f'The Panda serie:\n{s1}')

The Python list: [1, 3, 5, 7, 9]
The Panda serie:
0    1
1    3
2    5
3    7
4    9
dtype: int64


Notice that, at the end, the type of data is displayed. Let's try to see what happens if we feed the serie another type of data.

In [4]:
my_list2 = list('abcdef')
print(f'The Python list: {my_list2}')

s2 = pd.Series(my_list2)
print(f'The Panda serie:\n{s2}')

The Python list: ['a', 'b', 'c', 'd', 'e', 'f']
The Panda serie:
0    a
1    b
2    c
3    d
4    e
5    f
dtype: object


Let's try a mixed serie.

In [5]:
my_list3 = ['a', 1]
print(f'The Python list: {my_list3}')

s3 = pd.Series(my_list3)
print(f'The Panda serie:\n{s3}')

The Python list: ['a', 1]
The Panda serie:
0    a
1    1
dtype: object


Wanna change the index? Piece of cake!

In [6]:
values = list(range(5))
index = list('abcde')

s4 = pd.Series(values, index)
s4

a    0
b    1
c    2
d    3
e    4
dtype: int64

You can use also tuples, and dictionaries to create series. Sets are not allowed, thought, because they're not ordered.

In [7]:
my_set = {1, 2, 3}
s_set = pd.Series(my_set)

TypeError: 'set' type is unordered

In [None]:
my_dict = {'k1':10, 'k2':20, 'k3':30}
s_dict = pd.Series(my_dict)
s_dict

You can get the a value in a pandas series by refering to it via the index, just as you'd do with a dictionary.

In [None]:
s_dict['k2']

You can sum series with one another. It'll try to match the indexes to do the sums. If no pair is found, you get the indication NaN.

In [8]:
dict1 = dict(zip(list('abc'), list(range(3))))
dict2 = dict(zip(list('dcba'), list(range(10,17, 2))))

s1 = pd.Series(dict1)
s2 = pd.Series(dict2)
print(dict1, dict2)

print(s1 + s2)

{'a': 0, 'b': 1, 'c': 2} {'d': 10, 'c': 12, 'b': 14, 'a': 16}
a    16.0
b    15.0
c    14.0
d     NaN
dtype: float64


# Pandas DataFrame
A DataFrames is the most common Panda data structure. It's a two dimensional table, witch is actually a group of Panda series. Look.

In [9]:
people = [['Annie', 30, 10000], ['Saul', 40, 15000], ['Peter', 24, 6900], ['John', 33, 15000]]

df = pd.DataFrame(people, columns=['Name', 'Age', 'Income'],)
df

Unnamed: 0,Name,Age,Income
0,Annie,30,10000
1,Saul,40,15000
2,Peter,24,6900
3,John,33,15000


Rember I said a DataFrame is a group of series? Well, check this out!

In [10]:
print(type(df))
print(type(df['Name']))
print(type(df['Name'][0]))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
<class 'str'>


You can see how many rows and columns a DataFrame has by checking the `shape` attribute.

In [11]:
df.shape

(4, 3)

`info` has also some information about the DataFrame.

In [12]:
df.info

<bound method DataFrame.info of     Name  Age  Income
0  Annie   30   10000
1   Saul   40   15000
2  Peter   24    6900
3   John   33   15000>

Wanna read a sigle column of the DataFrame?

In [13]:
df['Age']

0    30
1    40
2    24
3    33
Name: Age, dtype: int64

Or...

In [14]:
df.Age

0    30
1    40
2    24
3    33
Name: Age, dtype: int64

Wanna create a new row? It's similar to how you'd create a new key in a dictonary.

In [15]:
df['Gender'] = ['F', 'M', 'M', 'M']
df

Unnamed: 0,Name,Age,Income,Gender
0,Annie,30,10000,F
1,Saul,40,15000,M
2,Peter,24,6900,M
3,John,33,15000,M


It's possible to delete rows and columns. You gotta use the `drop()` attribute. It receives two parameters: the first one is the name or the position of the row/column you wanna drop and the second one is the specification if you're refering to a row or a column. It can be 0, which means row, or 1, which can be column. First let's get rid of Peter.

Oh! By the way! If not informed, the default axis is 0, in other words, rows.

In [16]:
df.drop(2, 0)

Unnamed: 0,Name,Age,Income,Gender
0,Annie,30,10000,F
1,Saul,40,15000,M
3,John,33,15000,M


Now let's get rid of the age column.

In [17]:
df.drop('Age', 1)

Unnamed: 0,Name,Income,Gender
0,Annie,10000,F
1,Saul,15000,M
2,Peter,6900,M
3,John,15000,M


Did you notice Peter came back? This is because Pandas is not really deleting stuff. It's actually returning a DataFrame withouth the data you dropped. Given this, what you could do is:

In [18]:
df = df.drop(2, 0)

And now we really got rid of Peter!

In [19]:
df

Unnamed: 0,Name,Age,Income,Gender
0,Annie,30,10000,F
1,Saul,40,15000,M
3,John,33,15000,M


Or you can set the `inplace` argumento to True.

In [20]:
df.drop('Age', 1, inplace=True)

And now we got rid of the age column.

In [21]:
df

Unnamed: 0,Name,Income,Gender
0,Annie,10000,F
1,Saul,15000,M
3,John,15000,M


Do you want to rename columns? It's done this way.

In [22]:
df.rename(columns={'Name': 'First Name', 'Income': 'Salary'}, inplace=True)
df

Unnamed: 0,First Name,Salary,Gender
0,Annie,10000,F
1,Saul,15000,M
3,John,15000,M


Yup! Just like a dictionary! You could alter them also by messing with the `columns` attribute! You gotta inform all the columns in a list, though. Yeah... Even the ones you don't wanna rename. It's positional.

In [23]:
df.columns = ['Name', 'Annual Income', 'Gender']
df

Unnamed: 0,Name,Annual Income,Gender
0,Annie,10000,F
1,Saul,15000,M
3,John,15000,M
