In [2]:
import pandas as pd
import numpy as np

A **series** is a one dimensional array of indexed data. Pandas series has an index, unlike a numpy array

In [3]:
# a numpy array
arr = np.random.randn(4)
print(type(arr))
print(arr)

<class 'numpy.ndarray'>
[ 0.61796399  0.04697904 -0.52318128 -0.15093324]


In [4]:
# a pandas series made from the previous array
s = pd.Series(arr)
print(type(s))
print(s)

<class 'pandas.core.series.Series'>
0    0.617964
1    0.046979
2   -0.523181
3   -0.150933
dtype: float64


to create a pandas series, call ... 

`s = pd.Series(data, index=index)`

In [5]:
pd.Series(np.arange(3), index= [2023, 2024, 2025])

2023    0
2024    1
2025    2
dtype: int64

In [6]:
# A series from a list of strings with a default index
pd.Series(['EDS 220', 'EDS 222', 'EDS 223', 'EDS242'])

0    EDS 220
1    EDS 222
2    EDS 223
3     EDS242
dtype: object

In [7]:
# make a dictionary
d = {
    'key_0':2,
    'key_1':'3',
    'key_2':5
}
pd.Series(d)

key_0    2
key_1    3
key_2    5
dtype: object

In [8]:
pd.Series(3.0, index= ['A', 'B', 'C'])

A    3.0
B    3.0
C    3.0
dtype: float64

In [9]:
s = pd.Series([98, 73, 65], index = ['Andrea', 'Beth', 'Carolina'])

#divide by 10
print( s /10, '/n')

# exponential 
print(np.exp(s), '/n')

#original unchanged series
print(s)

Andrea      9.8
Beth        7.3
Carolina    6.5
dtype: float64 /n
Andrea      3.637971e+42
Beth        5.052394e+31
Carolina    1.694889e+28
dtype: float64 /n
Andrea      98
Beth        73
Carolina    65
dtype: int64


In [10]:
# Evaluate true and false

s > 70

Andrea       True
Beth         True
Carolina    False
dtype: bool

## Evaluating missing values:
"In pandas we can represent a missing, NULL, or NA value with the float value numpy.nan, which stands for “not a number”. Let’s construct a small series with some NA values represented this way:"

In [11]:
# NAs in this series
s = pd.Series([1, 2, np.nan, 4, np.nan])
print(s)

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
dtype: float64


In [12]:
# To check for NAs:
s.hasnans

True

In [13]:
# which elements are na?

s.isna()

0    False
1    False
2     True
3    False
4     True
dtype: bool

The integer number -999 is often used to represent missing values. Create a pandas.Series named s with four integer values, two of which are -999. The index of this series should be the the letters A through D.

In the pandas.Series documentation, look for the method mask(). Use this method to update the series s so that the -999 values are replaced by NA values. HINT: check the first example in the method’s documentation.

In [14]:
s1 = pd.Series([4, -999, 67, -999], index = ['A', 'B', 'C', 'D'])

In [15]:
s1.mask(s1 == -999, inplace= True)
s1

A     4.0
B     NaN
C    67.0
D     NaN
dtype: float64

## Pandas dataframe represents tabular data, think of as a spreadsheet. 

Each column of a `pandas.DataFrame` is a `pandas.Series`

In [16]:
# There are many ways to create a pandas data frame, Here is on example:

# Initialize directory with columns' data

d = {
    'col_name_1' : pd.Series(np.arange(3)),
    'col_name_2' : pd.Series([3.1, 3.2, 3.3])
}

df = pd.DataFrame(d)
df

Unnamed: 0,col_name_1,col_name_2
0,0,3.1
1,1,3.2
2,2,3.3


In [17]:
# changing the index
df.index = ['a', 'b', 'c']
df

Unnamed: 0,col_name_1,col_name_2
a,0,3.1
b,1,3.2
c,2,3.3


We can access the data frame’s column names via the columns attribute. Update the column names to C1 and C2 by updating this attribute.

In [20]:
df.columns = ['C1', 'C2']
df

Unnamed: 0,C1,C2
a,0,3.1
b,1,3.2
c,2,3.3
