main source : https://pandas.pydata.org/docs/

Pandas is a data science library, (note "data science" not actual AI as a whole). So it focuses mainly on boring and basic things.

Main gimmick of pandas is storing everything into either **Series** or **Dataframes**

In [2]:
import pandas as pd # this imports the module (like a class) then shorthands it to pd, less pain on the wrist to type.

https://pandas.pydata.org/docs/reference/api/pandas.Series.html

First we got Series, it sounds cool but is practically just an array

In [6]:
pd.Series([1, 2, "yo", 4.6]) # notice the dtype (data type) is an object, this is because we mixed types

0      1
1      2
2     yo
3    4.6
dtype: object

In [8]:
pd.Series([1, 2, 3, 4]) # pandas autoinfer what datatype we need

0    1
1    2
2    3
3    4
dtype: int64

changing __our_data_type__ to int would break, because you cant represent strings as integers (unless you get really deep into low level, which we dont need / care)

In [14]:
our_data_type = str
pd.Series(["i am illegal"], dtype=our_data_type) # pandas autoinfer what datatype we need

0    i am illegal
dtype: object

Series can be input any iterables, so python tuples, lists, and dictionaries work 0 problems (unordered sets dont work tho).

This fact is important, later on we can input numpy arrays inside

In [44]:
print("Series from Lists")
print(pd.Series([1, 2, 3, 4], name="cool series")) # Cool name that affects stuff
print()
print("Series from Tuple")
print(pd.Series((1, 2, 3, 4)))
print()
print("Series from Dictionary, using keys as index")
print(pd.Series({1:1, 3:2, 2:3, 4:4})) # when we dont provide index to the Series constructor, they auto use the dictionary keys
print()
print("Series from Dictionary, ignoring keys but using index parameter (the order shown from print seems random tho)")
print(pd.Series({1:1, 3:2, 2:3, 4:4}, index=(4, 3, 2, 1)))
print()

Series from Lists
0    1
1    2
2    3
3    4
Name: cool series, dtype: int64

Series from Tuple
0    1
1    2
2    3
3    4
dtype: int64

Series from Dictionary, using keys as index
1    1
3    2
2    3
4    4
dtype: int64

Series from Dictionary, ignoring keys but using index parameter (the order shown from print seems random tho)
4    4
3    2
2    3
1    1
dtype: int64



https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

Enterring tabular data, we can now represent data with table / higher dimensional objects. Dataframes are what we use 99% of the time

In [46]:
data = {'col1' : [1, 2, 3], 'col2' : [4, 5, 6], 'col3' : [7, 8, 9]} # keys are column names, values are iterables
pd.DataFrame(data)

Unnamed: 0,col1,col2,col3
0,1,4,7
1,2,5,8
2,3,6,9


In [50]:
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # We can also define columns at constructor
pd.DataFrame(data, columns=["col1", "col3", "col2"])

Unnamed: 0,col1,col3,col2
0,1,2,3
1,4,5,6
2,7,8,9


In [54]:
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # We can also define columns at constructor
dataframe = pd.DataFrame(data, columns=["col1", "col3", "col2"])
series = pd.Series([1, 2, 3, 4], name="cool series")
# Axis refers to direction  of the dimension (kinda like xyz axis) but in this case, axis 0 is the row direction
print(pd.concat([dataframe, series], axis=0))
# Axis 1 is the column direction, just what we need
print(pd.concat([dataframe, series], axis=1))

   col1  col3  col2  cool series
0   1.0   2.0   3.0          NaN
1   4.0   5.0   6.0          NaN
2   7.0   8.0   9.0          NaN
0   NaN   NaN   NaN          1.0
1   NaN   NaN   NaN          2.0
2   NaN   NaN   NaN          3.0
3   NaN   NaN   NaN          4.0
   col1  col3  col2  cool series
0   1.0   2.0   3.0            1
1   4.0   5.0   6.0            2
2   7.0   8.0   9.0            3
3   NaN   NaN   NaN            4


In [58]:
# the loc method is short for "location", we can index dataframes like [a][b] or [a,b]
dataframe.loc[:,"col3"] # this is python list slicing, the first ":" gets all rows then "col3" fetches specific column

0    2
1    5
2    8
Name: col3, dtype: int64