# Pandas Learning File

In [12]:
# importing pandas and numpy
import pandas as pd
import numpy as np

### Creating a Series from a List

- You can create a pandas Series directly from a list. The index will be the default range (0, 1, 2, ...).


In [17]:
my_data = [10, 20, 30]
series = pd.Series(data=my_data)
series

0    10
1    20
2    30
dtype: int64

### Creating a Series with a Custom Index

- You can specify custom labels for the index of the Series by passing a list of labels as the `index` argument.


In [20]:
labels = ['a', 'b', 'c']
series = pd.Series(data=my_data, index=labels)
series

a    10
b    20
c    30
dtype: int64

### Creating a Series from a NumPy Array

- You can create a Series by passing a NumPy array to the `pd.Series()` constructor. The index can be customized as well.


In [23]:
arr = np.array(my_data)
series = pd.Series(arr, labels)
series

a    10
b    20
c    30
dtype: int32

### Creating a Series from a Dictionary

- You can create a Series directly from a dictionary where keys become the index, and values become the data.


In [26]:
d = {'a': 10, 'b': 20, 'c': 30}
series = pd.Series(d)
series

a    10
b    20
c    30
dtype: int64

### Accessing Elements of a Series

- You can access individual elements of a Series using its label (index). The label is used as the key.


In [29]:
element = series['a']
element

10

### Creating and Adding Two Series

- You can create two Series with different indices and then perform arithmetic operations like addition. If an index is missing in one Series, the result will be `NaN` for that index.


In [34]:
series1 = pd.Series([1, 2, 3, 4], ['USA', 'Germany', 'USSR', 'Japan'])
series2 = pd.Series([1, 2, 5, 4], ['USA', 'Germany', 'Italy', 'Japan'])
result = series1 + series2
result

Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64

### Creating a Pandas DataFrame

- **DataFrames** are the most commonly used data structure in pandas.
- They are two-dimensional, size-mutable, and labeled data structures, similar to a table in a relational database.
- You can create a DataFrame from NumPy arrays or other data sources.


In [37]:
from numpy.random import randn
#to follow my instructor i will use seed 101
np.random.seed(101)


In [39]:
df = pd.DataFrame(randn(5, 4), ['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y', 'Z'])
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [41]:
#accessing columns of a DataFrame
column = df['W']
column

A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

In [43]:
#accessing multiple columns of a DataFrame
columns = df[['W', 'Z']]
columns

Unnamed: 0,W,Z
A,2.70685,0.503826
B,0.651118,0.605965
C,-2.018168,-0.589001
D,0.188695,0.955057
E,0.190794,0.683509


### Adding a New Column to a DataFrame

- You can create a new column in a DataFrame by assigning values to it. 
- For example, you can compute the sum of two existing columns to create a new column.

In [46]:
df['new'] = df['W'] + df['Y']
df

Unnamed: 0,W,X,Y,Z,new
A,2.70685,0.628133,0.907969,0.503826,3.614819
B,0.651118,-0.319318,-0.848077,0.605965,-0.196959
C,-2.018168,0.740122,0.528813,-0.589001,-1.489355
D,0.188695,-0.758872,-0.933237,0.955057,-0.744542
E,0.190794,1.978757,2.605967,0.683509,2.796762


### Removing a Column from a DataFrame
- Use the `drop` method to remove a column.
- Set `axis=1` for columns.
- Use `inplace=True` to modify the DataFrame directly.

In [52]:
df.drop('new', axis=1, inplace=True)
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


### Removing a Row from a DataFrame

- Use the `drop` method to remove a row.
- Set `axis=0` for rows.
- Use `inplace=True` to modify the DataFrame directly.


In [55]:
df.drop('E', axis=0, inplace=True)
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057


### Accessing Rows of a DataFrame

- Use `loc` to access rows by their label.
- Use `iloc` to access rows by their index position.


In [66]:
# Accessing a row by label
row = df.loc['A']
row

W    2.706850
X    0.628133
Y    0.907969
Z    0.503826
Name: A, dtype: float64

In [68]:
# Accessing a row by index position
row = df.iloc[0]
row

W    2.706850
X    0.628133
Y    0.907969
Z    0.503826
Name: A, dtype: float64