## Series

In [1]:
import numpy as np 
import pandas as pd 

from pandas import Series, DataFrame

In [2]:
# a simple series
obj = pd.Series([4, 7, -5, 3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

In [3]:
# array representation and index object of the Series
print(obj.array) # panda array
print(obj.index)

<NumpyExtensionArray>
[np.int64(4), np.int64(7), np.int64(-5), np.int64(3)]
Length: 4, dtype: int64
RangeIndex(start=0, stop=4, step=1)


In [4]:
obj2 = pd.Series([4, 7, -5, 3], index=["d", "b", "a", "c"])
obj2

d    4
b    7
a   -5
c    3
dtype: int64

In [5]:
obj2.index

Index(['d', 'b', 'a', 'c'], dtype='object')

In [8]:
print(obj2["a"])
print(obj2["d"])

-5
4


In [9]:
obj2["d"] = 6
obj2[["c", "d", "a"]] # list of strings even though it contains strings instead of integers

c    3
d    6
a   -5
dtype: int64

In [11]:
# operations
obj2[obj2 > 0]

d    6
b    7
c    3
dtype: int64

In [12]:
obj2 * 2

d    12
b    14
a   -10
c     6
dtype: int64

In [13]:
np.exp(obj2)

d     403.428793
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

Another way to think about a Series is as a fixed-length, ordered dictionary, as it is a
mapping of index values to data values

In [14]:
print("b" in obj2)
print("e" in obj2)

True
False


In [15]:
# dictionary to series
sdata = {"Ohio": 35000, "Texas": 71000, "Oregon": 16000, "Utah": 5000}
obj3 = pd.Series(sdata)
obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

In [16]:
# series to dictionary back
obj3.to_dict()

{'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}

In [17]:
# override order of keys by passing an index 
states = ["California", "Ohio", "Oregon", "Texas"]
obj4 = pd.Series(sdata, index=states)
obj4 # NaN (Not a Number) -> used to mark missing or NA values

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

In [23]:
# The 'isna' and 'notna' functions in pandas should be used to detect missing data
print(pd.isna(obj4))
print(pd.notna(obj4))

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool
California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool


In [24]:
# Series also has these as instance methods
obj4.isna()

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

In [25]:
## series automatically aligns by index label in arithmetic operations
# ex

print(obj3)
print(obj4)

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64
California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64


In [26]:
obj3 + obj4 
# presence for NaN in Utah occurs due to the way alignment works in Pandas when performing operations btw series obj.
# If an index exists in one Series but not in the other, the result for that index will be NaN (Not a Number), indicating missing data.

California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64

In [27]:
# A Series’s index can be altered in place by assignment
obj

0    4
1    7
2   -5
3    3
dtype: int64

In [29]:
obj.index = ["sev", "anar", "kela", "madarchod"]
obj

sev          4
anar         7
kela        -5
madarchod    3
dtype: int64

name attribute - metadata field that allows you to assign a label or identifier to the Series

##### Key Points About the `name` Attribute:
1. **Purpose**:
   - It is used to label a `Series` for easier identification.
   - When a `Series` is part of a `DataFrame`, its `name` serves as the column name.

2. **Default Value**:
   - If not explicitly assigned, the `name` attribute is `None`.

3. **Modifying the `name` Attribute**:
   - The `name` attribute can be set or changed at any time using assignment.

4. **Usage in DataFrames**:
   - If a `Series` is added to a `DataFrame`, the `name` becomes the column name for that `Series`.



In [35]:
obj4.name = "population"
obj4.index.name = "state"
obj4

state
California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
Name: population, dtype: float64