# Basics - pandas.Series Data Structure


* [Intro to data structures - Series](https://pandas.pydata.org/docs/user_guide/dsintro.html#series)

```s = pd.Series(data, index=index)```



In [29]:
import numpy as np
import pandas as pd

---
# NaN

NaN (not a number) is the standard **missing data marker** used in pandas. ```np.nan``` to set to mark NaN.

In [32]:
_s = pd.Series(data=[1, np.nan, 3])
_s

0    1.0
1    NaN
2    3.0
dtype: float64

---
# Series Data

data can be many different things:

* a Python dict
* an ndarray
* a scalar value (like 5)

In [21]:
s_scalar = pd.Series(data=1)
s_array = pd.Series(data=[1,2,3])
s_dict = pd.Series(data={"d":1, "b":2, "a":3})  # Preserve inserted order

print(f"scalar\n{s_scalar}")
print(f"\narray\n{s_array}")
print(f"\ndict\n{s_dict}")

scalar
0    1
dtype: int64

array
0    1
1    2
2    3
dtype: int64

dict
d    1
b    2
a    3
dtype: int64


## Series like np.ndarray


Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. 

In [14]:
s_array > 1

0    False
1     True
2     True
dtype: bool

In [13]:
s_array[1:] + [1,1]

1    3
2    4
dtype: int64

In [7]:
np.exp(s_array)

0     2.718282
1     7.389056
2    20.085537
dtype: float64

## Series like Python dict

In [11]:
s_array[1]

2

In [10]:
s_array.get(1)

2

# Series Index

## Default Index like Python range()

**NOT materialized** until used (lazy evaluation/instantiation), hence memory effeicient.

In [15]:
s_array.index

RangeIndex(start=0, stop=3, step=1)

In [16]:
for i in s_array.index:
    print(s_array[i])

1
2
3


## Dictioary Key Index is NOT like range()



In [23]:
s_dict.index

Index(['d', 'b', 'a'], dtype='object')

## Index can be sorted

In [27]:
s_dict.sort_index()

a    3
b    2
d    1
dtype: int64

# Series Array (replace Series.values)

**Prefer ```pd.Series.array``` to ```pd.Series.values```**.

* [pandas.Series.array](https://pandas.pydata.org/docs/reference/api/pandas.Series.array.html)

## array is np.ndarray compatible

For NumPy native types, this is a **thin (no copy) wrapper around numpy.ndarray**.



In [17]:
s_array.array 

<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64

In [18]:
s_array.array[1:3]

<PandasArray>
[2, 3]
Length: 2, dtype: int64

In [26]:
## values create copy
s_array.values

array([1, 2, 3])

---