# Pandas accessors

- https://towardsdatascience.com/pandas-dtype-specific-operations-accessors-c749bafb30a4
- [str accessor docs](https://pandas.pydata.org/docs/reference/series.html#api-series-str)
- [dt accessor docs](https://pandas.pydata.org/docs/reference/series.html#api-series-dt)

Within Pandas we have several types of accessors and the two most common are for: strings (`str`) and datetime (`dt`). They will allow us to work on particular data type and execute several operations dedicated to this data type, like making all letters uppercase in a string. 

In [1]:
import pandas as pd

In [2]:
url = 'https://raw.githubusercontent.com/piotrgradzinski/dap_20230114/main/day_6_pgg/emps.csv'
emps = pd.read_csv(url, sep=';', encoding='utf-8', index_col='employee_id', parse_dates=['hire_date'])
emps

Unnamed: 0_level_0,first_name,last_name,job_title,salary,hire_date,department_name,address,postal_code,city,country
employee_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
100,Steven,King,President,24000,1997-06-17,Executive,2004 Charade Rd,98199,Seattle,United States of America
101,Neena,Kochhar,Administration Vice President,17000,1999-09-21,Executive,2004 Charade Rd,98199,Seattle,United States of America
102,Lex,De Haan,Administration Vice President,17000,2003-01-13,Executive,2004 Charade Rd,98199,Seattle,United States of America
103,Alexander,Hunold,Programmer,9000,2000-01-03,IT,2014 Jabberwocky Rd,26192,Southlake,United States of America
104,Bruce,Ernst,Programmer,6000,2001-05-21,IT,2014 Jabberwocky Rd,26192,Southlake,United States of America
...,...,...,...,...,...,...,...,...,...,...
202,Pat,Fay,Marketing Representative,6000,2007-08-17,Marketing,147 Spadina Ave,M5V 2L7,Toronto,Canada
203,Susan,Mavris,Human Resources Representative,6500,2004-06-07,Human Resources,8204 Arthur St,,London,United Kingdom
204,Hermann,Baer,Public Relations Representative,10000,2004-06-07,Public Relations,Schwanthalerstr. 7031,80925,Munich,Germany
205,Shelley,Higgins,Accounting Manager,12000,2004-06-07,Accounting,2004 Charade Rd,98199,Seattle,United States of America


In [4]:
emps.dtypes

first_name                 object
last_name                  object
job_title                  object
salary                      int64
hire_date          datetime64[ns]
department_name            object
address                    object
postal_code                object
city                       object
country                    object
dtype: object

## `str` accessor

In [6]:
emps.last_name.str.upper()

employee_id
100       KING
101    KOCHHAR
102    DE HAAN
103     HUNOLD
104      ERNST
        ...   
202        FAY
203     MAVRIS
204       BAER
205    HIGGINS
206      GIETZ
Name: last_name, Length: 107, dtype: object

In [8]:
emps.last_name.str.lower()

employee_id
100       king
101    kochhar
102    de haan
103     hunold
104      ernst
        ...   
202        fay
203     mavris
204       baer
205    higgins
206      gietz
Name: last_name, Length: 107, dtype: object

With `str` accessor we do have access to indexing operator we can use on strings.

In [10]:
emps.last_name.str[0:3]

employee_id
100    Kin
101    Koc
102    De 
103    Hun
104    Ern
      ... 
202    Fay
203    Mav
204    Bae
205    Hig
206    Gie
Name: last_name, Length: 107, dtype: object

In [12]:
emps.last_name.str.replace('K', 'X')

employee_id
100       Xing
101    Xochhar
102    De Haan
103     Hunold
104      Ernst
        ...   
202        Fay
203     Mavris
204       Baer
205    Higgins
206      Gietz
Name: last_name, Length: 107, dtype: object

In [16]:
# https://pandas.pydata.org/docs/reference/api/pandas.Series.str.match.html
# we can use regular expressions
emps.last_name.str.match('.*in.*')

employee_id
100     True
101    False
102    False
103    False
104    False
       ...  
202    False
203    False
204    False
205     True
206    False
Name: last_name, Length: 107, dtype: bool

## `dt` accessor

In [18]:
emps.hire_date.dt.year

employee_id
100    1997
101    1999
102    2003
103    2000
104    2001
       ... 
202    2007
203    2004
204    2004
205    2004
206    2004
Name: hire_date, Length: 107, dtype: int64

In [23]:
emps.hire_date.dt.is_month_end

employee_id
100    False
101    False
102    False
103    False
104    False
       ...  
202    False
203    False
204    False
205    False
206    False
Name: hire_date, Length: 107, dtype: bool

In [25]:
emps[emps.hire_date.dt.is_month_end]

Unnamed: 0_level_0,first_name,last_name,job_title,salary,hire_date,department_name,address,postal_code,city,country
employee_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
111,Ismael,Sciarra,Accountant,7700,2007-09-30,Finance,2004 Charade Rd,98199,Seattle,United States of America


In [27]:
emps[emps.hire_date.dt.is_month_start]

Unnamed: 0_level_0,first_name,last_name,job_title,salary,hire_date,department_name,address,postal_code,city,country
employee_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
122,Payam,Kaufling,Stock Manager,7900,2005-05-01,Shipping,2011 Interiors Blvd,99236,South San Francisco,United States of America
145,John,Russell,Sales Manager,14000,2006-10-01,Sales,"Magdalen Centre, The Oxford Science Park",OX9 9ZB,Oxford,United Kingdom
158,Allan,McEwen,Sales Representative,9000,2006-08-01,Sales,"Magdalen Centre, The Oxford Science Park",OX9 9ZB,Oxford,United Kingdom
194,Samuel,McCain,Shipping Clerk,3200,2008-07-01,Shipping,2011 Interiors Blvd,99236,South San Francisco,United States of America


In [28]:
emps.hire_date.dt.strftime('%d.%m.%Y')

employee_id
100    17.06.1997
101    21.09.1999
102    13.01.2003
103    03.01.2000
104    21.05.2001
          ...    
202    17.08.2007
203    07.06.2004
204    07.06.2004
205    07.06.2004
206    07.06.2004
Name: hire_date, Length: 107, dtype: object