# Introduction to working with Pandas Series
**Chapters 4 and 5** from [Effective Pandas](https://store.metasnake.com/effective-pandas-book)

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('vehicles.csv')
data.shape, type(data)

  data = pd.read_csv('vehicles.csv')


((41144, 83), pandas.core.frame.DataFrame)

In [5]:
city_mpg = data.city08 # miles per gallon conduciendo en ciudad
highway_mpg = data.highway08 # miles per gallon conduciendo en autopista

type(city_mpg), type(highway_mpg)

(pandas.core.series.Series, pandas.core.series.Series)

¿Cuántos atributos existen para operar sobre un objeto del tipo serie?

In [21]:
len(dir(city_mpg)), len(dir(highway_mpg)) 

(420, 420)

## `is_monotonic`
Return boolean if values in the object are monotonic_increasing.

In [26]:
city_mpg.is_monotonic

False

In [39]:
highway_mpg.is_monotonic

False

In [40]:
pd.Series([1,2,3,4,5]).is_monotonic

True

In [41]:
pd.Series([1,2,3,4,20]).is_monotonic

True

In [42]:
pd.Series([1,2,3,4,1]).is_monotonic

False

## `sort_values`
Sort by the values along either axis.<br>
Este atributo tambien se puede aplicar a una DF completa

In [43]:
highway_mpg.sort_values()
# notar que la serie queda ordenada de menor a mayor pero mantiene el mantiene el indice que tenia la serie original

23231      9
1979       9
26858      9
1990       9
23176      9
        ... 
32599    122
34312    123
32815    123
34563    124
34564    124
Name: highway08, Length: 41144, dtype: int64

In [48]:
top_10 = highway_mpg.sort_values(ascending=False)[:10]
top_10
# una forma rapida de seleccionar las entradas con los 10 valores mas altos.

34563    124
34564    124
34312    123
32815    123
32599    122
31256    122
33423    122
32740    120
34311    117
34168    117
Name: highway08, dtype: int64

Notar que la serie top_10 todavia conserva el mismo indice que tenian los valroes en la serie higway_mpg.<br>
Si queremos resetear el indice de la nueva serie tenemos que usar el atributo reset_index

## `reset_index`
Reset the index of the DataFrame (or Series), and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

In [52]:
top_10.reset_index(drop = True)

0    124
1    124
2    123
3    123
4    122
5    122
6    122
7    120
8    117
9    117
Name: highway08, dtype: int64

In [53]:
# si no usaramos como parametro drop = True, el atributo reset_index() nos devuelve una DF que el viejo indice como
# una nueva columna de la DF
top_10.reset_index()

Unnamed: 0,index,highway08
0,34563,124
1,34564,124
2,34312,123
3,32815,123
4,32599,122
5,31256,122
6,33423,122
7,32740,120
8,34311,117
9,34168,117


## `add_prefix` | `add_suffix` 
Prefix labels with string prefix.<br>

For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.

In [55]:
highway_mpg.add_prefix('USA-')

USA-0        25
USA-1        14
USA-2        33
USA-3        12
USA-4        23
             ..
USA-41139    26
USA-41140    28
USA-41141    24
USA-41142    24
USA-41143    21
Name: highway08, Length: 41144, dtype: int64

In [58]:
city_mpg.add_suffix('-item')

0-item        19
1-item         9
2-item        23
3-item        10
4-item        17
              ..
41139-item    19
41140-item    20
41141-item    18
41142-item    18
41143-item    16
Name: city08, Length: 41144, dtype: int64

## `apply`
Invoke function on values of Series.<br>

Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values

In [62]:
def mpg_class(val):
    if val < 25:
        return 'Low'
    elif  25 <= val >= 50:
        return 'Medium'
    else: return 'High'

In [70]:
city_class = city_mpg.apply(mpg_class)
city_class

0        Low
1        Low
2        Low
3        Low
4        Low
        ... 
41139    Low
41140    Low
41141    Low
41142    Low
41143    Low
Name: city08, Length: 41144, dtype: object

## `copy`
Make a copy of this object’s indices and data.

In [71]:
# a veces si queremos hacer una modificacion sobre la Serie, pero no queremos correr riesgo de hacer algo mal,
# ANTES de hacer la transformacin podemos hacer una copia de la Serie y estamos seguros de que la Serie original
# no va a ser modificada

new_index_city_mpg = city_mpg.copy(deep = True)
new_index_city_mpg.add_prefix('US-')

US-0        19
US-1         9
US-2        23
US-3        10
US-4        17
            ..
US-41139    19
US-41140    20
US-41141    18
US-41142    18
US-41143    16
Name: city08, Length: 41144, dtype: int64

In [72]:
# notar que la Serie original no fue modificada
city_mpg

0        19
1         9
2        23
3        10
4        17
         ..
41139    19
41140    20
41141    18
41142    18
41143    16
Name: city08, Length: 41144, dtype: int64

## `.str.contains`
Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

In [75]:
car_brand = data.make
car_brand

0        Alfa Romeo
1           Ferrari
2             Dodge
3             Dodge
4            Subaru
            ...    
41139        Subaru
41140        Subaru
41141        Subaru
41142        Subaru
41143        Subaru
Name: make, Length: 41144, dtype: object

In [76]:
car_brand.str.contains('aru') # suongamos que necesitamos encontrar los autos que terminan en 'arub'

0        False
1        False
2        False
3        False
4         True
         ...  
41139     True
41140     True
41141     True
41142     True
41143     True
Name: make, Length: 41144, dtype: bool

## `endswith`
Similar a `contains` pero este atributo solo busca en el final de cada palabra

In [77]:
car_brand.str.endswith('aru')

0        False
1        False
2        False
3        False
4         True
         ...  
41139     True
41140     True
41141     True
41142     True
41143     True
Name: make, Length: 41144, dtype: bool

## `.str.find`
Return lowest indexes in each strings in the Series/Index.

Each of returned indexes corresponds to the position where the substring is fully contained between [start:end]. Return -1 on failure.

In [78]:
car_brand.str.find('Subaru')

0       -1
1       -1
2       -1
3       -1
4        0
        ..
41139    0
41140    0
41141    0
41142    0
41143    0
Name: make, Length: 41144, dtype: int64