# Pandas.Series

### One-dimensional ndarray with axis labels (including time series).

- Labels need not be unique but must be a hashable type. 
- The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. 
- Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).

Operations between Series (+, -, /, *, **) align values based on their associated index values– they need not be the same length. The result index will be the sorted union of the two indexes.

### Parameters:
- data: array-like, Iterable, dict, or scalar value
     - Contains data stored in Series. If data is a dict, argument order is maintained.

- index: array-like or Index (1d)
    - Values must be hashable and have the same length as data. 
    - Non-unique index values are allowed. 
    - Will default to RangeIndex (0, 1, 2, …, n) if not provided. 
    - If data is dict-like and index is None, then the keys in the data are used as the index. 
    - If the index is not None, the resulting Series is reindexed with the index values.

- dtype: str, numpy.dtype, or ExtensionDtype, optional
    - Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages.

- name: Hashable, default None
    - The name to give to the Series.

- copy: bool, default False
    - Copy input data. Only affects Series or 1d ndarray input. See examples.

### Example:
- Constructing Series from a dictionary with an Index specified

In [2]:
import pandas as pd
d = {'a': 1, 'b': 2, 'c': 3}
ser = pd.Series(data=d, index=['a', 'b', 'c'])
ser

a    1
b    2
c    3
dtype: int64

In [3]:
# The keys of the dictionary match with the Index values, hence the Index values have no effect.
d = {'a': 1, 'b': 2, 'c': 3}
ser = pd.Series(data=d, index=['x', 'y', 'z'])
ser

x   NaN
y   NaN
z   NaN
dtype: float64

- Index is first build with the keys from the dictionary. 
- After this the Series is reindexed with the given Index values, hence we get all NaN as a result.

### Constructing Series from a list with copy=False.

- Due to input data type the Series has a copy of the original data even though copy=False, so the data is unchanged.

In [5]:
r = [1, 2]
ser = pd.Series(r, copy=False)
ser.iloc[0] = 999

In [6]:
ser

0    999
1      2
dtype: int64

### Constructing Series from a 1d ndarray with copy=False.


In [7]:
ser = pd.Series(r, copy=False)
ser.iloc[0] = 999
r

[1, 2]

In [8]:
ser

0    999
1      2
dtype: int64

### Series Attributes and Methods
- Reading Employment To Population Data File and applying various pandas series and attributes on it

In [9]:
import pandas as pd

In [28]:
s = pd.read_csv("employment_to_population_1979_to_2022.csv",usecols=["year"]).squeeze()
s

0     2022
1     2021
2     2020
3     2019
4     2018
5     2017
6     2016
7     2015
8     2014
9     2013
10    2012
11    2011
12    2010
13    2009
14    2008
15    2007
16    2006
17    2005
18    2004
19    2003
20    2002
21    2001
22    2000
23    1999
24    1998
25    1997
26    1996
27    1995
28    1994
29    1993
30    1992
31    1991
32    1990
33    1989
34    1988
35    1987
36    1986
37    1985
38    1984
39    1983
40    1982
41    1981
42    1980
43    1979
Name: year, dtype: int64

In [29]:
# Check the dtype 
type(s)

pandas.core.series.Series

# Attributes

### Series.index - The index (axis labels) of the Series.

In [30]:
s.index

RangeIndex(start=0, stop=44, step=1)

### Series.values - Return Series as ndarray or ndarray-like depending on the dtype.

In [31]:
s.values

array([2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012,
       2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001,
       2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990,
       1989, 1988, 1987, 1986, 1985, 1984, 1983, 1982, 1981, 1980, 1979],
      dtype=int64)

###  Series.dtype - Return the dtype object of the underlying data.

In [32]:
s.dtype

dtype('int64')

### Series.shape - Return a tuple of the shape of the underlying data.

In [33]:
s.shape

(44,)

###  Series.dtypes - Return the dtype object of the underlying data.

In [34]:
s.dtypes

dtype('int64')

### Series.name - Return the name of the Series.

In [35]:
s.name

'year'

In [39]:
s = pd.read_csv("employment_to_population_1979_to_2022.csv",usecols=["all"]).squeeze()
s

0     58.5
1     56.5
2     60.8
3     60.5
4     60.1
5     59.7
6     59.4
7     59.1
8     58.6
9     58.6
10    58.4
11    58.5
12    59.1
13    62.0
14    63.0
15    63.2
16    62.7
17    62.3
18    62.3
19    62.7
20    63.5
21    64.4
22    64.3
23    64.1
24    63.9
25    63.3
26    62.9
27    62.7
28    61.8
29    61.5
30    61.6
31    62.7
32    63.0
33    62.4
34    61.6
35    60.8
36    60.3
37    59.7
38    58.2
39    57.8
40    59.0
41    59.2
42    60.1
43    59.6
Name: all, dtype: float64

# Series Methods

### Series.count() - Return number of non-NA/null observations in the Series.



In [40]:
s.count()

44

### Series.max() -  Return the maximum of the values over the requested axis.

In [41]:
s.max()

64.4

### Series.min() -  Return the minimum of the values over the requested axis.

In [42]:
s.min()

56.5

### Series.mean([axis, skipna, numeric_only]) - Return the mean of the values over the requested axis.

In [43]:
s.mean()

61.0090909090909

### Series.rank([axis, method, numeric_only, ...]) - Compute numerical data ranks (1 through n) along axis.

In [44]:
s.rank()

0      5.5
1      1.0
2     21.5
3     20.0
4     17.5
5     15.5
6     13.0
7     10.5
8      7.5
9      7.5
10     4.0
11     5.5
12    10.5
13    27.0
14    36.5
15    38.0
16    32.5
17    28.5
18    28.5
19    32.5
20    40.0
21    44.0
22    43.0
23    42.0
24    41.0
25    39.0
26    35.0
27    32.5
28    26.0
29    23.0
30    24.5
31    32.5
32    36.5
33    30.0
34    24.5
35    21.5
36    19.0
37    15.5
38     3.0
39     2.0
40     9.0
41    12.0
42    17.5
43    14.0
Name: all, dtype: float64

### Series.sum([axis, skipna, numeric_only, ...])- Return the sum of the values over the requested axis.

In [45]:
s.sum()

2684.3999999999996

### Series.var([axis, skipna, ddof, numeric_only]) - Return unbiased variance over requested axis.

In [46]:
s.var()

4.229682875264268

### Series.where(cond[, other, inplace, axis, level]) - Replace values where the condition is False.

In [50]:
s.where(s > 60, other = 0)

0      0.0
1      0.0
2     60.8
3     60.5
4     60.1
5      0.0
6      0.0
7      0.0
8      0.0
9      0.0
10     0.0
11     0.0
12     0.0
13    62.0
14    63.0
15    63.2
16    62.7
17    62.3
18    62.3
19    62.7
20    63.5
21    64.4
22    64.3
23    64.1
24    63.9
25    63.3
26    62.9
27    62.7
28    61.8
29    61.5
30    61.6
31    62.7
32    63.0
33    62.4
34    61.6
35    60.8
36    60.3
37     0.0
38     0.0
39     0.0
40     0.0
41     0.0
42    60.1
43     0.0
Name: all, dtype: float64

### Series.describe([percentiles, include, exclude]) - Generate descriptive statistics.

In [51]:
s.describe()

count    44.000000
mean     61.009091
std       2.056619
min      56.500000
25%      59.175000
50%      61.150000
75%      62.700000
max      64.400000
Name: all, dtype: float64

### Series.diff([periods]) - First discrete difference of element

In [52]:
s.diff()

0     NaN
1    -2.0
2     4.3
3    -0.3
4    -0.4
5    -0.4
6    -0.3
7    -0.3
8    -0.5
9     0.0
10   -0.2
11    0.1
12    0.6
13    2.9
14    1.0
15    0.2
16   -0.5
17   -0.4
18    0.0
19    0.4
20    0.8
21    0.9
22   -0.1
23   -0.2
24   -0.2
25   -0.6
26   -0.4
27   -0.2
28   -0.9
29   -0.3
30    0.1
31    1.1
32    0.3
33   -0.6
34   -0.8
35   -0.8
36   -0.5
37   -0.6
38   -1.5
39   -0.4
40    1.2
41    0.2
42    0.9
43   -0.5
Name: all, dtype: float64