In [1]:
import pandas as pd
import numpy as np

## Series
A series is a *one-dimensional* array of *indexed* data. This **index** is the main difference between a `pandas.Series` and a NumPy array.

In [2]:
# A numpy array
arr = np.random.randn(4) # random values from std normal distribution
print(type(arr))
print(arr, "\n")

# A pandas series made from the previous array
s = pd.Series(arr)
print(type(s))
print(s)

<class 'numpy.ndarray'>
[ 0.90186815  1.8877376  -0.30477295 -0.60568551] 

<class 'pandas.core.series.Series'>
0    0.901868
1    1.887738
2   -0.304773
3   -0.605686
dtype: float64


### Creating a pandas.Series
```
s = pd.Series(data, index=index)
```

In [3]:
# A series from a numpy array 
pd.Series(np.arange(3), index=[2023, 2024, 2025])

2023    0
2024    1
2025    2
dtype: int64

In [4]:
# A series from a list of strings with default index
pd.Series(['EDS 220', 'EDS 222', 'EDS 223', 'EDS 242'])

0    EDS 220
1    EDS 222
2    EDS 223
3    EDS 242
dtype: object

In [5]:
# Construct dictionary
d = {'key_0':2, 'key_1':'3', 'key_2':5}

# Initialize series using a dictionary
pd.Series(d)

key_0    2
key_1    3
key_2    5
dtype: object

In [6]:
# Create series from a single value
pd.Series(3.0, index = ['A', 'B', 'C'])

A    3.0
B    3.0
C    3.0
dtype: float64

### Simple operations

In [7]:
# Define a series
s = pd.Series([98,73,65],index=['Andrea', 'Beth', 'Carolina'])

# Divide each element in series by 10
print(s /10, '\n')

# Take the exponential of each element in series
print(np.exp(s), '\n')

# Original series is unchanged
print(s)

Andrea      9.8
Beth        7.3
Carolina    6.5
dtype: float64 

Andrea      3.637971e+42
Beth        5.052394e+31
Carolina    1.694889e+28
dtype: float64 

Andrea      98
Beth        73
Carolina    65
dtype: int64


In [8]:
# Produce a new series with True/False values indicating whether the elements satisfy the condition or not
s > 70

Andrea       True
Beth         True
Carolina    False
dtype: bool

### Identify missing values

In [9]:
# Series with NAs in it
s = pd.Series([1, 2, np.nan, 4, np.nan])
s

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
dtype: float64

In [10]:
# Check if series has NAs
s.hasnans

# Check which values are NA
s.isna()

0    False
1    False
2     True
3    False
4     True
dtype: bool

## Check-in
1. The integer number -999 is often used to represent missing values. Create a `pandas.Series` named `s` with four integer values, two of which are -999. The index of this series should be the the letters A through D.

In [11]:
s = pd.Series([12, -999, 45, -999], index = ['A', 'B', 'C', 'D'])
s

A     12
B   -999
C     45
D   -999
dtype: int64

2. In the `pandas.Series` documentation, look for the method `mask()`. Use this method to update the series `s` so that the -999 values are replaced by NA values. HINT: check the first example in the method’s documentation.

In [15]:
s.mask(s == -999)

A    12.0
B     NaN
C    45.0
D     NaN
dtype: float64

### Creating a pandas.DataFrame

In [16]:
# Initialize dictionary with columns' data 
d = {'col_name_1' : pd.Series(np.arange(3)),
     'col_name_2' : pd.Series([3.1, 3.2, 3.3]),
     }

# Create data frame
df = pd.DataFrame(d)
df

Unnamed: 0,col_name_1,col_name_2
0,0,3.1
1,1,3.2
2,2,3.3


In [17]:
# Change index
df.index = ['a','b','c']
df

Unnamed: 0,col_name_1,col_name_2
a,0,3.1
b,1,3.2
c,2,3.3


In [20]:
df.columns = ["C1", "C2"]
df

Unnamed: 0,C1,C2
a,0,3.1
b,1,3.2
c,2,3.3


## Lesson Summary
This lesson focused on the different ways to construct pandas.Series and pandas.DataFrames using different methods including from NumPy arrays, lists, single values, and dictionaries. We also learned how to assign indicies, rename columns, identify missing values with `.hasnans` and `.isna`, and perform basic operations on series. I think the most important part is understanding how to efficiently and effectively manipulate Series, the role they have in DataFrames, and understanding the ways the differ and overlap with Numpy arrays and operations.