# Introductory Material

Data encompasses a collection of discrete objects, numbers, words, events, facts, measurements, observations, or even descriptions of things.

## Making Sense of Data
- Numerical Data
> Discrete Data
> 
> Continuous Data
- Categorical Data
> Dichotomous Variable
>
> Polytomous Variable
- Measurement Scales
> Nominal
> 
> Ordinal
>
> Interval
>
> Ratio

The "why" of EDA is to process data so that it becomes information and we can process that information so that it becomes knowledge.

EDA fits into a broader set of activities called data analysis.

The stages of data analysis are as follows:
1. Data requirements
2. Data collection
3. Data processing
4. Data cleaning
5. EDA
6. Modeling and algorithm
7. Data product
8. Communication

## Primary Aim of EDA
To examine what data can tell us before actually going through formal modeling or hypothesis formulation.

## Significance of EDA
EDA reveals the ground truth about the content without making any underlying assumptions.

## Steps in EDA
1. Problem definition
2. Data preparation
3. Data analysis
4. Development and representation of results

## Activities of EDA
- Discover patterns
- Spot anomalies
- Test hypotheses
- Check assumptions using statistical measures

In [73]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# NumPy

## For creating different types of NumPy arrays

In [3]:
my1DArray = np.array([1, 8, 27, 64])
print(my1DArray)

[ 1  8 27 64]


In [4]:
my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]])
print(my2DArray)

[[ 1  2  3  4]
 [ 2  4  9 16]
 [ 4  8 18 32]]


In [5]:
my3DArray = np.array([[[1, 2, 3, 4], [5, 6, 7, 8]], [[1, 2, 3, 4], [9, 10, 11, 12]]])
print(my3DArray)

[[[ 1  2  3  4]
  [ 5  6  7  8]]

 [[ 1  2  3  4]
  [ 9 10 11 12]]]


## For displaying basic information, such as the data type, shape, size, and strides of NumPy array

In [6]:
print(my2DArray.data)

<memory at 0x7fa401336450>


In [7]:
print(my2DArray.shape)

(3, 4)


In [8]:
print(my2DArray.dtype)

int64


In [9]:
print(my2DArray.strides)

(32, 8)


In [10]:
print(my3DArray.shape)

(2, 2, 4)


### Strides
Strides in NumPy are a way of indexing arrays that specify the number of bytes to jump to find the next element. It's important to know strides when doing computations with arrays because they provide a complete understanding of memory layout.

For example, consider a 1D array of 8 numbers (i.e.,). The stride for this array is 8, which means that to find the next element, you need to jump 8 bytes forward in memory.

Strides can also be used to index multidimensional arrays. For example, consider a 2D array of 4x4 numbers (i.e., [,,,]). The stride for the first dimension of this array is 32, which means that to find the next element in the first dimension, you need to jump 32 bytes forward in memory. The stride for the second dimension of this array is 8, which means that to find the next element in the second dimension, you need to jump 8 bytes forward in memory.

Strides can be used to perform a variety of operations on arrays, such as slicing, indexing, and broadcasting. For example, to slice an array, you can use the stride to specify the number of elements to skip. To index an array, you can use the stride to specify the offset of the element you want to access. To broadcast an array, you can use the stride to specify the shape of the output array.

## For creating an array using built-in NumPy functions

In [11]:
ones = np.ones((3,4))
print(ones)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [12]:
zeros = np.zeros((2,3,4))
print(zeros)

[[[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]]


In [13]:
emptyArray = np.empty((3,2))
print(emptyArray)

[[0. 0.]
 [0. 0.]
 [0. 0.]]


In [14]:
fullArray = np.full((2,2),7)
print(fullArray)

[[7 7]
 [7 7]]


In [15]:
evenSpacedArray = np.arange(10,25,5)
print(evenSpacedArray)

[10 15 20]


In [16]:
evenSpacedArray2 = np.linspace(0,2,9)
print(evenSpacedArray2)

[0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]


## For NumPy arrays and file operations

In [17]:
# Save a numpy array into file
x = np.arange(0.0,50.0,1.0)
np.savetxt('data.out', x, delimiter=',')

In [18]:
# Loading numpy array from text
z = np.loadtxt('data.out', unpack=True)
print(z)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49.]


In [19]:
# Loading numpy array using genfromtxt method
my_array2 = np.genfromtxt('data.out', skip_header=1, filling_values=-999)
print(my_array2)

[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.
 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49.]


## For inspecting NumPy arrays

In [20]:
# print the number of 'my2DArray`'s dimensions
print(my2DArray.ndim)

2


In [21]:
# print the number of `my2DArray`'s elements
print(my2DArray.size)

12


In [22]:
# print information about `my2DArray`'s memory layout
print(my2DArray.flags)

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False



In [23]:
# print the length of one array element in bytes
print(my2DArray.itemsize)

8


In [24]:
# print the total consumed bytes by `my2DArray`'s elements
print(my2DArray.nbytes)

96


## Broadcasting is a mechanism that permits NumPy to operate with arrays of different shapes when performing arithmetic operations

In [25]:
# Rule 1: Two dimensions are operatable if they are equal
# Create an array of two dimensions
A = np.ones((6, 8))
# Shape of A
print(A.shape)

(6, 8)


In [26]:
# Create another array
B = np.random.random((6, 8))
# Shape of B
print(B.shape)

(6, 8)


In [27]:
# Sum of A and B, here the shape of both matrices is the same
print(A+B)

[[1.11695207 1.23613529 1.42770547 1.28249983 1.77300434 1.16111793
  1.97118603 1.95167633]
 [1.37845491 1.04683873 1.54297138 1.13707935 1.59711756 1.45246577
  1.90547689 1.09253575]
 [1.13460759 1.5476109  1.55691592 1.95216007 1.02479364 1.37148834
  1.30444185 1.67932041]
 [1.05771197 1.75828092 1.05480353 1.21312451 1.23095423 1.04024695
  1.50225639 1.3218552 ]
 [1.51549101 1.41829716 1.49857709 1.80009802 1.10280446 1.98321045
  1.89613249 1.1998268 ]
 [1.62178877 1.13539284 1.08381112 1.31198673 1.65846805 1.94699439
  1.19615047 1.87987499]]


In [28]:
# Rule 2: Two dimensions are also compatible when one of the dimensions of the array is 1. 
# Initialize `x`
x = np.ones((3, 4))
print(x)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [29]:
# Check shape of `x`
print(x.shape)

(3, 4)


In [30]:
# Initialize `y`
y = np.arange(4)
print(y)

[0 1 2 3]


In [31]:
# Check shape of `y`
print(y.shape)

(4,)


In [32]:
# Subtract `x` and `y`
print(x - y)

[[ 1.  0. -1. -2.]
 [ 1.  0. -1. -2.]
 [ 1.  0. -1. -2.]]


In [33]:
# Rule 3: Arrays can be broadcast together if they are compatible in all dimensions
x = np.ones((6, 8))
y = np.random.random((10, 1, 8))
print(x + y)

[[[1.30525452 1.95203257 1.77421897 1.02667771 1.39013186 1.12003725
   1.86543265 1.4899271 ]
  [1.30525452 1.95203257 1.77421897 1.02667771 1.39013186 1.12003725
   1.86543265 1.4899271 ]
  [1.30525452 1.95203257 1.77421897 1.02667771 1.39013186 1.12003725
   1.86543265 1.4899271 ]
  [1.30525452 1.95203257 1.77421897 1.02667771 1.39013186 1.12003725
   1.86543265 1.4899271 ]
  [1.30525452 1.95203257 1.77421897 1.02667771 1.39013186 1.12003725
   1.86543265 1.4899271 ]
  [1.30525452 1.95203257 1.77421897 1.02667771 1.39013186 1.12003725
   1.86543265 1.4899271 ]]

 [[1.22624473 1.58327198 1.711515   1.57837642 1.68675302 1.47595048
   1.21712565 1.6796771 ]
  [1.22624473 1.58327198 1.711515   1.57837642 1.68675302 1.47595048
   1.21712565 1.6796771 ]
  [1.22624473 1.58327198 1.711515   1.57837642 1.68675302 1.47595048
   1.21712565 1.6796771 ]
  [1.22624473 1.58327198 1.711515   1.57837642 1.68675302 1.47595048
   1.21712565 1.6796771 ]
  [1.22624473 1.58327198 1.711515   1.57837642 1

Why did the above work?  It comes down to the following:

The dimensions are compared from the last dimension to the first.

- Compare x's last dimension (8) with y's last dimension (8): They are equal.
- Compare x's second-to-last dimension (6) with y's second-to-last dimension (1): One of them is 1, so broadcasting is possible.
- y has an additional dimension at the front (10) which x lacks, so x's shape is implicitly extended with a new leading dimension of size 1.

## For seeing NumPy mathematics at work

In [34]:
# Basic operations (+, -, *, /, %)
x = np.array([[1, 2, 3], [2, 3, 4]])
y = np.array([[1, 4, 9], [2, 3, -2]])

In [35]:
# Add the two arrays
add = np.add(x, y)
print(add)

[[ 2  6 12]
 [ 4  6  2]]


In [36]:
# Subtract the two arrays
sub = np.subtract(x, y)
print(sub)

[[ 0 -2 -6]
 [ 0  0  6]]


In [37]:
# Multiply the two arrays
mul = np.multiply(x, y)
print(mul)

[[ 1  8 27]
 [ 4  9 -8]]


In [38]:
# Divide the two arrays
div = np.divide(x, y)
print(div)

[[ 1.          0.5         0.33333333]
 [ 1.          1.         -2.        ]]


In [39]:
# Calculate the remainder of x and y
rem = np.remainder(x, y)
print(rem)

[[0 2 3]
 [0 0 0]]


## Create a subset and slice an array using an index

In [40]:
x = np.array([10, 20, 30, 40, 50])

In [41]:
# Select items at index 0 and 1
print(x[0:2])

[10 20]


In [42]:
# Select item at row 0 and 1 and column 1 from 2D array
y = np.array([[1, 2, 3, 4], [9, 10, 11, 12]])
print(y[0:2,1])

[ 2 10]


In [43]:
# Specifying conditions
biggerThan2 = (y >= 2)
print(y[biggerThan2])

[ 2  3  4  9 10 11 12]


# Pandas

In [44]:
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

In [45]:
# Create a DataSeries
series = pd.Series([2, 3, 7, 11, 13, 17, 19, 23])
print(series)

0     2
1     3
2     7
3    11
4    13
5    17
6    19
7    23
dtype: int64


In [46]:
# Creating a dataframe from a series
series_df = pd.DataFrame({
    'A':range(1,5),
    'B':pd.Timestamp('20190526'),
    'C':pd.Series(5,index=list(range(4)), dtype='float64'),
    'D':np.array([3]*4,dtype='int64'),
    'E':pd.Categorical(["Depression", "Social Anxiety", "Bipolar Disorder", "Eating Disorder"]),
    'F':'Mental Health',
    'G':'is Challenging'
})
print(series_df)

   A          B    C  D                 E              F               G
0  1 2019-05-26  5.0  3        Depression  Mental Health  is Challenging
1  2 2019-05-26  5.0  3    Social Anxiety  Mental Health  is Challenging
2  3 2019-05-26  5.0  3  Bipolar Disorder  Mental Health  is Challenging
3  4 2019-05-26  5.0  3   Eating Disorder  Mental Health  is Challenging


In [47]:
# Creating dataframe from dictionary
dict_df = [{'A': 'Apple', 'B': 'Ball'}, {'A':'Aeroplane', 'B': 'Bat', 'C': 'Cat'}]
dict_df = pd.DataFrame(dict_df)
print(dict_df)

           A     B    C
0      Apple  Ball  NaN
1  Aeroplane   Bat  Cat


In [48]:
dict_df.head()

Unnamed: 0,A,B,C
0,Apple,Ball,
1,Aeroplane,Bat,Cat


In [49]:
# Creating a dataframe from ndarrays
sdf = {
    'County':['Ostfold', 'Hordaland', 'Oslo', 'Hedmark', 'Oppland', 'Buskerud'],
    'ISO-Code':[1, 2, 3, 4, 5, 6],
    'Area': [4180.69, 4917.94, 454.07, 27397.76, 25192.10, 14910.94],
    'Administrative centre': ["Sarpsborg", "Oslo", "City of Oslo", "Hamar", "Lillehammer", "Drammen"]
}
sdf = pd.DataFrame(sdf)
print(sdf)

      County  ISO-Code      Area Administrative centre
0    Ostfold         1   4180.69             Sarpsborg
1  Hordaland         2   4917.94                  Oslo
2       Oslo         3    454.07          City of Oslo
3    Hedmark         4  27397.76                 Hamar
4    Oppland         5  25192.10           Lillehammer
5   Buskerud         6  14910.94               Drammen


In [50]:
# Loading a dataset from an external source into a pandas DataFrame
columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num',
          'marital_status', 'occupation', 'relationship', 'ethnicity',
          'gender', 'capital_gain', 'capital_loss', 'hours_per_week', 
          'country_of_origin', 'income']
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data',names=columns)
df.head(10)

Unnamed: 0,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,ethnicity,gender,capital_gain,capital_loss,hours_per_week,country_of_origin,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
5,37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,<=50K
6,49,Private,160187,9th,5,Married-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,<=50K
7,52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
8,31,Private,45781,Masters,14,Never-married,Prof-specialty,Not-in-family,White,Female,14084,0,50,United-States,>50K
9,42,Private,159449,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,5178,0,40,United-States,>50K


In [51]:
# Display rows, columns, data types, and memory used by the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   age                32561 non-null  int64 
 1   workclass          32561 non-null  object
 2   fnlwgt             32561 non-null  int64 
 3   education          32561 non-null  object
 4   education_num      32561 non-null  int64 
 5   marital_status     32561 non-null  object
 6   occupation         32561 non-null  object
 7   relationship       32561 non-null  object
 8   ethnicity          32561 non-null  object
 9   gender             32561 non-null  object
 10  capital_gain       32561 non-null  int64 
 11  capital_loss       32561 non-null  int64 
 12  hours_per_week     32561 non-null  int64 
 13  country_of_origin  32561 non-null  object
 14  income             32561 non-null  object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB


In [56]:
# Select rows and columns in any dataframe
# Select a row, in other words the two-column output is ONE entry on line 11 or "10" with 0 being the first entry
df.iloc[10]

age                                   37
workclass                        Private
fnlwgt                            280464
education                   Some-college
education_num                         10
marital_status        Married-civ-spouse
occupation               Exec-managerial
relationship                     Husband
ethnicity                          Black
gender                              Male
capital_gain                           0
capital_loss                           0
hours_per_week                        80
country_of_origin          United-States
income                              >50K
Name: 10, dtype: object

In [58]:
# Select 10 rows
df.iloc[0:10]

Unnamed: 0,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,ethnicity,gender,capital_gain,capital_loss,hours_per_week,country_of_origin,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
5,37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,<=50K
6,49,Private,160187,9th,5,Married-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,<=50K
7,52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
8,31,Private,45781,Masters,14,Never-married,Prof-specialty,Not-in-family,White,Female,14084,0,50,United-States,>50K
9,42,Private,159449,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,5178,0,40,United-States,>50K


In [59]:
# Select a range of rows
df.iloc[10:15]

Unnamed: 0,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,ethnicity,gender,capital_gain,capital_loss,hours_per_week,country_of_origin,income
10,37,Private,280464,Some-college,10,Married-civ-spouse,Exec-managerial,Husband,Black,Male,0,0,80,United-States,>50K
11,30,State-gov,141297,Bachelors,13,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,40,India,>50K
12,23,Private,122272,Bachelors,13,Never-married,Adm-clerical,Own-child,White,Female,0,0,30,United-States,<=50K
13,32,Private,205019,Assoc-acdm,12,Never-married,Sales,Not-in-family,Black,Male,0,0,50,United-States,<=50K
14,40,Private,121772,Assoc-voc,11,Married-civ-spouse,Craft-repair,Husband,Asian-Pac-Islander,Male,0,0,40,?,>50K


In [60]:
# Select the last 2 rows
df.iloc[-2:]

Unnamed: 0,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,ethnicity,gender,capital_gain,capital_loss,hours_per_week,country_of_origin,income
32559,22,Private,201490,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,<=50K
32560,52,Self-emp-inc,287927,HS-grad,9,Married-civ-spouse,Exec-managerial,Wife,White,Female,15024,0,40,United-States,>50K


In [61]:
# Select every other row in columns 3-5
df.iloc[::2, 3:5].head()

Unnamed: 0,education,education_num
0,Bachelors,13
2,HS-grad,9
4,Bachelors,13
6,9th,5
8,Masters,14


In [65]:
# Combine NumPy and Pandas to create a dataframe; note textbook had an error on the concat line concatenating to df
np.random.seed(24)
dFrame = pd.DataFrame({'F': np.linspace(1, 10, 10)})
dFrame = pd.concat([dFrame, pd.DataFrame(np.random.randn(10, 5),
columns=list('EDCBA'))],
                  axis=1)
dFrame.iloc[0, 2] = np.nan
dFrame

Unnamed: 0,F,E,D,C,B,A
0,1.0,1.329212,,-0.31628,-0.99081,-1.070816
1,2.0,-1.438713,0.564417,0.295722,-1.626404,0.219565
2,3.0,0.678805,1.889273,0.961538,0.104011,-0.481165
3,4.0,0.850229,1.453425,1.057737,0.165562,0.515018
4,5.0,-1.336936,0.562861,1.392855,-0.063328,0.121668
5,6.0,1.207603,-0.00204,1.627796,0.354493,1.037528
6,7.0,-0.385684,0.519818,1.686583,-1.325963,1.428984
7,8.0,-2.089354,-0.12982,0.631523,-0.586538,0.29072
8,9.0,1.264103,0.290035,-1.970288,0.803906,1.03055
9,10.0,0.118098,-0.021853,0.046841,-1.628753,-0.392361


In [67]:
# Define a function that should color the values that are less than 0
def colorNegativeValueToRed(value):
    if value < 0:
        color = 'red'
    elif value > 0:
        color = 'black'
    else:
        color = 'green'
    return 'color: %s' % color

In [68]:
# Pass the 'colorNegativeValueToRed' function into the dataframe
s = dFrame.style.applymap(colorNegativeValueToRed,
subset=['A', 'B', 'C', 'D', 'E'])
s

  s = dFrame.style.applymap(colorNegativeValueToRed,


Unnamed: 0,F,E,D,C,B,A
0,1.0,1.329212,,-0.31628,-0.99081,-1.070816
1,2.0,-1.438713,0.564417,0.295722,-1.626404,0.219565
2,3.0,0.678805,1.889273,0.961538,0.104011,-0.481165
3,4.0,0.850229,1.453425,1.057737,0.165562,0.515018
4,5.0,-1.336936,0.562861,1.392855,-0.063328,0.121668
5,6.0,1.207603,-0.00204,1.627796,0.354493,1.037528
6,7.0,-0.385684,0.519818,1.686583,-1.325963,1.428984
7,8.0,-2.089354,-0.12982,0.631523,-0.586538,0.29072
8,9.0,1.264103,0.290035,-1.970288,0.803906,1.03055
9,10.0,0.118098,-0.021853,0.046841,-1.628753,-0.392361


In [69]:
# Scan column and highlight the max and min in the column
def highlightMax(s):
    isMax = s == s.max()
    return ['background-color: orange' if v else '' for v in isMax]

def highlightMin(s):
    isMin = s == s.min()
    return ['background-color: green' if v else '' for v in isMin]

In [72]:
# Apply these two functions to the dataframe; errors again in the text
styled_df = dFrame.style.apply(highlightMax).apply(highlightMin).highlight_null(color='red')
styled_df

Unnamed: 0,F,E,D,C,B,A
0,1.0,1.329212,,-0.31628,-0.99081,-1.070816
1,2.0,-1.438713,0.564417,0.295722,-1.626404,0.219565
2,3.0,0.678805,1.889273,0.961538,0.104011,-0.481165
3,4.0,0.850229,1.453425,1.057737,0.165562,0.515018
4,5.0,-1.336936,0.562861,1.392855,-0.063328,0.121668
5,6.0,1.207603,-0.00204,1.627796,0.354493,1.037528
6,7.0,-0.385684,0.519818,1.686583,-1.325963,1.428984
7,8.0,-2.089354,-0.12982,0.631523,-0.586538,0.29072
8,9.0,1.264103,0.290035,-1.970288,0.803906,1.03055
9,10.0,0.118098,-0.021853,0.046841,-1.628753,-0.392361


In [74]:
colorMap = sns.light_palette("pink", as_cmap=True)
sns_style_df = dFrame.style.background_gradient(cmap=colorMap)
sns_style_df

Unnamed: 0,F,E,D,C,B,A
0,1.0,1.329212,,-0.31628,-0.99081,-1.070816
1,2.0,-1.438713,0.564417,0.295722,-1.626404,0.219565
2,3.0,0.678805,1.889273,0.961538,0.104011,-0.481165
3,4.0,0.850229,1.453425,1.057737,0.165562,0.515018
4,5.0,-1.336936,0.562861,1.392855,-0.063328,0.121668
5,6.0,1.207603,-0.00204,1.627796,0.354493,1.037528
6,7.0,-0.385684,0.519818,1.686583,-1.325963,1.428984
7,8.0,-2.089354,-0.12982,0.631523,-0.586538,0.29072
8,9.0,1.264103,0.290035,-1.970288,0.803906,1.03055
9,10.0,0.118098,-0.021853,0.046841,-1.628753,-0.392361


# SciPy

SciPy is a scientific library for Python and is open source. We are going to use this library in the upcoming chapters. This library depends on the NumPy library, which provides an efficient n-dimensional array manipulation function. We are going to learn more about these libraries in the upcoming chapters.

# Matplotlib

Matplotlib provides a huge library of customizable plots, along with a comprehensive set of backends. It can be utilized to create professional reporting applications, interactive analytical applications, complex dashboard applications, web/GUI applications, embedded views, and many more.