### Series.array - Wrapper over ndarray

- The ExtensionArray of the data backing this Series or Index.

    - Returns: ExtensionArray
        - An ExtensionArray of the values stored within. For extension types, this is the actual array. For NumPy native types, this is a thin (no copy) wrapper around numpy.ndarray
        - .array differs .values which may require converting the data to a different form.
        
- For regular NumPy types like int, and float, a NumpyExtensionArray is returned.

In [2]:
import pandas as pd

In [3]:
pd.Series([1, 2, 3]).array

<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64

- For extension types, like Categorical, the actual ExtensionArray is returned

In [4]:
ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
ser.array

['a', 'b', 'a']
Categories (2, object): ['a', 'b']

###  Converting Series from csv file into an ndarray

- Read an HR Analytics file, whose one column would be converted to a Series
- The Series would then be transformed into ndarray by using Wrapper of Pandas Array Series

In [8]:
data = pd.read_csv("HR_Analytics.csv")

# find out the name of the columns of the csv file
list(data.columns)

['EmpID',
 'Age',
 'AgeGroup',
 'Attrition',
 'BusinessTravel',
 'DailyRate',
 'Department',
 'DistanceFromHome',
 'Education',
 'EducationField',
 'EmployeeCount',
 'EmployeeNumber',
 'EnvironmentSatisfaction',
 'Gender',
 'HourlyRate',
 'JobInvolvement',
 'JobLevel',
 'JobRole',
 'JobSatisfaction',
 'MaritalStatus',
 'MonthlyIncome',
 'SalarySlab',
 'MonthlyRate',
 'NumCompaniesWorked',
 'Over18',
 'OverTime',
 'PercentSalaryHike',
 'PerformanceRating',
 'RelationshipSatisfaction',
 'StandardHours',
 'StockOptionLevel',
 'TotalWorkingYears',
 'TrainingTimesLastYear',
 'WorkLifeBalance',
 'YearsAtCompany',
 'YearsInCurrentRole',
 'YearsSinceLastPromotion',
 'YearsWithCurrManager']

In [7]:
# Iterate over the columns
for i in data.columns:
    print(i, end= " ")

EmpID Age AgeGroup Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber EnvironmentSatisfaction Gender HourlyRate JobInvolvement JobLevel JobRole JobSatisfaction MaritalStatus MonthlyIncome SalarySlab MonthlyRate NumCompaniesWorked Over18 OverTime PercentSalaryHike PerformanceRating RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager 

- Convert the Education column of the HR Analytics into a Series

In [42]:
my_series = data["EducationField"].squeeze()
my_series

0       Life Sciences
1             Medical
2           Marketing
3       Life Sciences
4             Medical
            ...      
1475    Life Sciences
1476        Marketing
1477        Marketing
1478        Marketing
1479          Medical
Name: EducationField, Length: 1480, dtype: object

In [43]:
type(my_series)

pandas.core.series.Series

In [44]:
# Identify the duplicate value
my_series[my_series.duplicated(keep=False)]

0       Life Sciences
1             Medical
2           Marketing
3       Life Sciences
4             Medical
            ...      
1475    Life Sciences
1476        Marketing
1477        Marketing
1478        Marketing
1479          Medical
Name: EducationField, Length: 1480, dtype: object

In [45]:
my_series.drop_duplicates(inplace= True)

In [46]:
my_series

0        Life Sciences
1              Medical
2            Marketing
10    Technical Degree
13               Other
93     Human Resources
Name: EducationField, dtype: object

In [22]:
# Reindex the Series index
len(my_series)

6

In [23]:
range(len(my_series))

range(0, 6)

In [47]:
my_series= my_series.reset_index(drop=True)

In [48]:
my_series

0       Life Sciences
1             Medical
2           Marketing
3    Technical Degree
4               Other
5     Human Resources
Name: EducationField, dtype: object

In [49]:
# Convert the  Series into an array
my_series.array

<PandasArray>
[   'Life Sciences',          'Medical',        'Marketing',
 'Technical Degree',            'Other',  'Human Resources']
Length: 6, dtype: object