[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ipgromov/machine_learning_project_attachments/blob/master/MLBP2018_Project_Report.ipynb)

**Machine Learning Basic Principles 2018 - Data Analysis Project Report**

*All the text in italics is instructions for filling the template - remove when writing the project report!*

# *Title* 

*Title should be concise and informative, describes the approach to solve the problem. Some good titles from previous years:*

*- Comparing extreme learning machines and naive bayes’ classifier in spam detection*

*- Using linear discriminant analysis in spam detection*

*Some not-so-good titles:*

*- Bayesian spam filtering with extras*

*- Two-component classifier for spam detection*

*- CS-E3210 Term Project, final report*




## Abstract

*Precise summary of the whole report, previews the contents and results. Must be a single paragraph between 100 and 200 words.*



## 1. Introduction

*Background, problem statement, motivation, many references, description of
contents. Introduces the reader to the topic and the broad context within which your
research/project fits*

*- What do you hope to learn from the project?*
*- What question is being addressed?*
*- Why is this task important? (motivation)*

*Keep it short (half to 1 page).*



## 2. Data analysis

*Briefly describe data (class distribution, dimensionality) and how will it affect
classification. Visualize the data. Don’t focus too much on the meaning of the features,
unless you want to.*

*- Include histograms showing class distribution.*



The following code cells provide connection to the Kaggle account and download the datafiles to the "kaggle_data" directory of this notebook *(to be deleted later - only needed for Googe Colab)*.


In [1]:
# Install Kaggle package to the environment to access the data through the API
!pip install kaggle

Collecting kaggle
[?25l  Downloading https://files.pythonhosted.org/packages/c6/78/832b9a9ec6b3baf8ec566e1f0a695f2fd08d2c94a6797257a106304bfc3c/kaggle-1.4.7.1.tar.gz (52kB)
[K    100% |████████████████████████████████| 61kB 2.2MB/s 
Collecting python-slugify (from kaggle)
  Downloading https://files.pythonhosted.org/packages/00/ad/c778a6df614b6217c30fe80045b365bfa08b5dd3cb02e8b37a6d25126781/python-slugify-1.2.6.tar.gz
Collecting Unidecode>=0.04.16 (from python-slugify->kaggle)
[?25l  Downloading https://files.pythonhosted.org/packages/59/ef/67085e30e8bbcdd76e2f0a4ad8151c13a2c5bce77c85f8cad6e1f16fb141/Unidecode-1.0.22-py2.py3-none-any.whl (235kB)
[K    100% |████████████████████████████████| 235kB 4.8MB/s 
[?25hBuilding wheels for collected packages: kaggle, python-slugify
  Running setup.py bdist_wheel for kaggle ... [?25l- \ done
[?25h  Stored in directory: /root/.cache/pip/wheels/44/2c/df/22a6eeb780c36c28190faef6252b739fdc47145fd87a6642d4
  Running setup.py bdist_wheel for

In [2]:
# Setting API credentials (shouldn't be done this way - really unsafe in reality)
%env KAGGLE_USERNAME=addableowl
%env KAGGLE_KEY=640cd50ab6f85b9166403e039ba662d0

env: KAGGLE_USERNAME=addableowl
env: KAGGLE_KEY=640cd50ab6f85b9166403e039ba662d0


In [3]:
# Downloading and unzipping the data files
!kaggle competitions download -c mlbp-data-analysis-challenge-accuracy-2018 -f test_data.csv -p kaggle_data
!unzip kaggle_data/test_data.csv.zip -d kaggle_data
!kaggle competitions download -c mlbp-data-analysis-challenge-accuracy-2018 -f train_data.csv -p kaggle_data
!unzip kaggle_data/train_data.csv.zip -d kaggle_data
!kaggle competitions download -c mlbp-data-analysis-challenge-accuracy-2018 -f train_labels.csv -p kaggle_data

Downloading test_data.csv.zip to kaggle_data
  0% 0.00/4.87M [00:00<?, ?B/s]
100% 4.87M/4.87M [00:00<00:00, 151MB/s]
Archive:  kaggle_data/test_data.csv.zip
  inflating: kaggle_data/test_data.csv  
Downloading train_data.csv.zip to kaggle_data
  0% 0.00/3.25M [00:00<?, ?B/s]
100% 3.25M/3.25M [00:00<00:00, 168MB/s]
Archive:  kaggle_data/train_data.csv.zip
  inflating: kaggle_data/train_data.csv  
Downloading train_labels.csv to kaggle_data
  0% 0.00/8.61k [00:00<?, ?B/s]
100% 8.61k/8.61k [00:00<00:00, 5.32MB/s]


Now we finally move to getting hands on the data!


In [0]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [0]:
# Load the data
train_df = pd.read_csv('kaggle_data/train_data.csv', header=None)
train_labels = pd.read_csv('kaggle_data/train_labels.csv', header=None)
test_df = pd.read_csv('kaggle_data/test_data.csv', header=None)

The feature space of the dataset is represented on this figure:
![alt text](https://github.com/ipgromov/Machine-Learning-Project/blob/master/images/feature_space.png?raw=true)
The feature extraction proccess is described below:
![alt text](https://github.com/ipgromov/Machine-Learning-Project/blob/master/images/feature_extraction.png?raw=true)

In [0]:
# Add artificially generated 'id' column

if 'id' not in train_df:
  train_df.insert(0, 'id', range(train_df.shape[0]))
train_df.set_index('id');

# Put ; in the end if you do not want to generate the output from the line of code

if 'id' not in train_labels:
  train_labels.insert(0, 'id', range(train_labels.shape[0]))
train_labels.set_index('id');

if 'id' not in test_df:
  test_df.insert(0, 'id', range(test_df.shape[0]))
test_df.set_index('id');

In [0]:
# Create a function that generates meaningful names for the columns
# The naming follows the pattern "component_part_statistic"

def generate_col_names():
  col_names = []
  components = {'Rhythm': (24, ('mean', 'median' , 'variance', 'kurtosis', 'skewness', 'min', 'max')), 'Chroma' : (12, ('mean', 'std', 'min', 'max')), 'MFCC' : (12, ('mean', 'std', 'min', 'max'))} 
  abbreviations = {'Rhythm': 'r', 'Chroma' : 'c', 'MFCC' : 'm'} 
  
  for component in components:
    for statistic in components[component][1]:
      for i in range(1, components[component][0] + 1):
        col_names.append('{}_{}_{}'.format(abbreviations[component], i, statistic))
  
  return col_names

In [0]:
# Call the function to get column names and then create a dictionary ( like 0 -> r_1_mean ... ) which can be used in df.rename method

col_names = generate_col_names()
col_names_dict = dict(zip(range(len(col_names)), col_names))

train_df = train_df.rename(col_names_dict, axis='columns')
test_df = test_df.rename(col_names_dict, axis='columns')

# Rename the column in train labels as 'gt' (short for groundtruth)

train_labels = train_labels.rename({0 : 'gt'}, axis='columns')

Observations about training data:

In [9]:
# Observe charachteristics of the features in the dataset

train_description = train_df.describe()
train_description

Unnamed: 0,id,r_1_mean,r_2_mean,r_3_mean,r_4_mean,r_5_mean,r_6_mean,r_7_mean,r_8_mean,r_9_mean,...,m_3_max,m_4_max,m_5_max,m_6_max,m_7_max,m_8_max,m_9_max,m_10_max,m_11_max,m_12_max
count,4363.0,4363.0,4363.0,4363.0,4363.0,4363.0,4363.0,4363.0,4363.0,4363.0,...,4363.0,4363.0,4363.0,4363.0,4363.0,4363.0,4363.0,4363.0,4363.0,4363.0
mean,2181.0,3097.683714,4390.947408,3987.672465,4004.861996,3086.664978,3329.815872,3043.628357,3034.574066,2671.463266,...,0.148412,0.235156,0.099695,0.149826,0.0791,0.102965,0.070939,0.078456,0.070708,0.059353
std,1259.633942,1309.219331,1457.625815,1228.185868,1242.336635,1031.020501,1124.845689,1097.705493,1121.064034,1017.781965,...,0.053224,0.102935,0.038655,0.057284,0.026279,0.0303,0.020456,0.022035,0.026422,0.018123
min,0.0,1.24,3.565,73.644,147.37,58.027,136.67,139.31,157.26,118.53,...,-0.013915,-0.013511,-0.022156,-0.046172,0.003606,0.005359,0.003377,0.011343,0.005302,-0.007228
25%,1090.5,2200.9,3386.15,3128.65,3173.2,2373.0,2551.05,2264.8,2233.15,1923.2,...,0.111165,0.14839,0.070612,0.109415,0.061715,0.08187,0.056992,0.062765,0.051641,0.046992
50%,2181.0,3114.6,4376.8,3988.5,3976.3,3041.3,3276.4,2960.9,2956.8,2601.6,...,0.14403,0.225,0.097368,0.14621,0.075773,0.10127,0.069202,0.077442,0.066268,0.057879
75%,3271.5,3949.15,5391.9,4814.95,4795.3,3741.4,4055.7,3769.1,3758.1,3347.2,...,0.180845,0.325125,0.123075,0.187625,0.092576,0.121835,0.083009,0.093005,0.085709,0.069603
max,4362.0,9172.4,9062.5,8318.1,9342.2,8275.7,8169.5,7547.6,8262.2,8667.9,...,0.38305,0.48523,0.30129,0.35241,0.24601,0.2356,0.195,0.17001,0.18218,0.14977


In [0]:
# Export train_description as excel file

# !pip install openpyxl

# writer = pd.ExcelWriter('output.xlsx')
# train_description.to_excel(writer,'Sheet1')
# writer.save()

In [11]:
# Observe the fisrt five rows of the dataframe
train_df.head()

Unnamed: 0,id,r_1_mean,r_2_mean,r_3_mean,r_4_mean,r_5_mean,r_6_mean,r_7_mean,r_8_mean,r_9_mean,...,m_3_max,m_4_max,m_5_max,m_6_max,m_7_max,m_8_max,m_9_max,m_10_max,m_11_max,m_12_max
0,0,1040.7,2315.6,2839.1,2552.2,2290.4,1913.8,2152.6,1930.3,2079.3,...,0.21649,0.36548,0.093584,0.16687,0.083426,0.11809,0.089792,0.074371,0.073162,0.059463
1,1,2309.4,4780.4,4055.7,3120.5,1979.9,2343.6,2634.2,3208.5,3078.0,...,0.10067,0.14739,0.10256,0.21304,0.082041,0.080967,0.07645,0.052523,0.052357,0.055297
2,2,2331.9,4607.0,4732.3,5007.0,3164.9,3171.9,2915.7,3282.3,2400.0,...,0.12676,0.36321,0.1142,0.22378,0.10077,0.18691,0.06727,0.061138,0.085509,0.049422
3,3,3350.9,6274.4,5037.0,4609.7,3438.8,3925.8,3746.4,3539.4,3053.7,...,0.096479,0.2895,0.074124,0.20158,0.049032,0.13021,0.0458,0.080885,0.14891,0.042027
4,4,2017.6,3351.8,2924.9,2726.3,1979.9,1930.9,2083.4,1889.2,1695.4,...,0.13834,0.38266,0.079402,0.063495,0.053717,0.08675,0.06209,0.048999,0.033159,0.070813


In [12]:
# And the five last rows
train_df.tail()

Unnamed: 0,id,r_1_mean,r_2_mean,r_3_mean,r_4_mean,r_5_mean,r_6_mean,r_7_mean,r_8_mean,r_9_mean,...,m_3_max,m_4_max,m_5_max,m_6_max,m_7_max,m_8_max,m_9_max,m_10_max,m_11_max,m_12_max
4358,4358,3050.6,6570.7,5138.4,3620.7,3059.7,3909.8,3408.6,2232.5,2344.7,...,0.23154,0.28806,0.067744,0.094464,0.036829,0.091753,0.073977,0.10275,0.073559,0.059759
4359,4359,4356.6,2882.6,2349.8,2628.1,2838.8,3846.1,3490.6,3023.8,3023.5,...,0.16211,0.27834,0.11086,0.22943,0.091665,0.080436,0.083978,0.09652,0.080711,0.064529
4360,4360,2624.2,3688.7,2128.4,2318.7,1855.8,2002.8,1277.8,1391.8,1459.9,...,0.23888,0.28705,0.12068,0.20085,0.10545,0.082589,0.079226,0.065561,0.052131,0.080473
4361,4361,2751.0,3767.4,3858.7,6012.7,3201.4,3604.1,3332.9,3119.4,2464.1,...,0.114,0.17835,0.13904,0.12196,0.068742,0.11347,0.076305,0.056053,0.042466,0.057299
4362,4362,4622.2,4410.2,2117.4,1533.8,1180.4,1164.3,1175.7,1005.4,906.58,...,0.071,0.30119,0.057491,0.18763,0.080588,0.1287,0.036234,0.11781,0.098176,0.03236


Observations about train labels:

In [13]:
train_labels.describe()

Unnamed: 0,id,gt
count,4363.0,4363.0
mean,2181.0,2.812056
std,1259.633942,2.500889
min,0.0,1.0
25%,1090.5,1.0
50%,2181.0,2.0
75%,3271.5,4.0
max,4362.0,10.0


In [14]:
train_labels.head()

Unnamed: 0,id,gt
0,0,1
1,1,1
2,2,1
3,3,1
4,4,1


In [15]:
train_labels.tail()

Unnamed: 0,id,gt
4358,4358,2
4359,4359,3
4360,4360,2
4361,4361,5
4362,4362,2


Observations about test data:

In [16]:
test_df.describe()

Unnamed: 0,id,r_1_mean,r_2_mean,r_3_mean,r_4_mean,r_5_mean,r_6_mean,r_7_mean,r_8_mean,r_9_mean,...,m_3_max,m_4_max,m_5_max,m_6_max,m_7_max,m_8_max,m_9_max,m_10_max,m_11_max,m_12_max
count,6544.0,6544.0,6544.0,6544.0,6544.0,6544.0,6544.0,6544.0,6544.0,6544.0,...,6544.0,6544.0,6544.0,6544.0,6544.0,6544.0,6544.0,6544.0,6544.0,6544.0
mean,3271.5,3058.037015,4355.87359,3932.126318,3951.374646,3047.596245,3274.043362,3001.38357,2985.446799,2611.9209,...,0.149118,0.235902,0.098636,0.149432,0.078883,0.102099,0.070408,0.078485,0.070002,0.059404
std,1889.234413,1311.270031,1486.928587,1258.432257,1250.45553,1034.248602,1134.089032,1107.284447,1125.748255,1004.59801,...,0.053524,0.103901,0.037731,0.057528,0.025469,0.030438,0.020272,0.022191,0.026132,0.018306
min,0.0,1.066,1.066,1.066,1.066,1.066,1.066,1.066,1.066,1.066,...,-0.23635,-0.11269,-0.03128,-0.026286,0.00357,0.015833,-0.01066,0.012256,0.010891,0.001152
25%,1635.75,2105.9,3310.025,3062.025,3092.8,2324.45,2486.65,2226.4,2188.275,1894.775,...,0.112285,0.147525,0.069917,0.108408,0.061924,0.080391,0.056767,0.062885,0.051215,0.046886
50%,3271.5,3090.15,4327.8,3922.1,3924.35,3009.05,3210.0,2925.0,2903.35,2546.55,...,0.144455,0.22597,0.095175,0.1462,0.076156,0.100175,0.068897,0.077103,0.06516,0.057507
75%,4907.25,3920.275,5376.3,4769.7,4776.475,3717.6,3979.65,3704.8,3700.3,3246.825,...,0.18012,0.326053,0.120692,0.187967,0.092706,0.121443,0.083066,0.092811,0.08382,0.069989
max,6543.0,9564.2,9898.4,9965.8,8971.1,7325.8,8842.9,8168.2,8045.7,8034.1,...,0.37969,0.47775,0.31733,0.34847,0.25988,0.24215,0.17832,0.18725,0.18048,0.17285


In [17]:
test_df.head()

Unnamed: 0,id,r_1_mean,r_2_mean,r_3_mean,r_4_mean,r_5_mean,r_6_mean,r_7_mean,r_8_mean,r_9_mean,...,m_3_max,m_4_max,m_5_max,m_6_max,m_7_max,m_8_max,m_9_max,m_10_max,m_11_max,m_12_max
0,0,3115.5,3151.9,2742.2,3236.8,2580.6,2662.6,2372.2,2569.6,2310.7,...,0.202,0.27811,0.13829,0.16759,0.10669,0.135,0.087223,0.11503,0.074181,0.082354
1,1,2055.0,3225.7,2273.5,3601.9,1959.3,2212.3,1720.4,1640.4,1157.1,...,0.095165,0.17908,0.1343,0.14612,0.062988,0.11378,0.076223,0.10532,0.10475,0.0674
2,2,1601.2,3768.1,3591.0,3452.4,2935.9,2913.4,2479.6,2108.8,1823.8,...,0.11455,0.30952,0.096229,0.10636,0.037123,0.085613,0.051728,0.074393,0.047407,0.043231
3,3,2745.0,4585.8,4246.0,3937.6,3423.9,4108.2,3096.5,3469.6,3002.5,...,0.1123,0.42906,0.057022,0.1661,0.057594,0.12736,0.017878,0.075741,0.09206,0.028369
4,4,1515.0,2403.5,2461.6,2399.1,1999.4,2195.8,2123.0,2030.7,1737.4,...,0.098629,0.27692,0.057491,0.24435,0.03983,0.050223,0.045778,0.078015,0.067199,0.043448


In [18]:
test_df.tail()

Unnamed: 0,id,r_1_mean,r_2_mean,r_3_mean,r_4_mean,r_5_mean,r_6_mean,r_7_mean,r_8_mean,r_9_mean,...,m_3_max,m_4_max,m_5_max,m_6_max,m_7_max,m_8_max,m_9_max,m_10_max,m_11_max,m_12_max
6539,6539,5458.1,3816.1,2762.7,2198.3,1591.3,1135.3,758.03,1031.9,815.45,...,0.077645,0.21367,0.075442,0.079,0.12722,0.099362,0.073899,0.040763,0.05166,0.064701
6540,6540,1597.6,2797.1,2198.8,1969.8,1889.4,2328.2,2289.2,2466.7,2086.0,...,0.26681,0.3559,0.13176,0.08636,0.078558,0.1225,0.060723,0.07858,0.027565,0.061399
6541,6541,1372.0,2186.0,1860.4,1932.1,1470.7,2403.0,2242.6,2227.6,2006.0,...,0.22813,0.23255,0.11253,0.074153,0.080115,0.055913,0.053767,0.052231,0.048352,0.036212
6542,6542,3921.2,4255.6,3193.0,4426.2,2868.7,3070.7,2640.0,2507.3,2229.4,...,0.10882,0.33653,0.096614,0.092141,0.074061,0.12656,0.058021,0.082832,0.076089,0.055648
6543,6543,1989.6,3771.4,3394.6,2957.2,2375.6,2747.8,2891.2,2420.2,2391.2,...,0.12455,0.11777,0.11448,0.18934,0.086654,0.10411,0.041426,0.071077,0.050447,0.059275


**The meanings of labels:**
1.  'Pop_Rock'
2.  'Electronic'
3.  'Rap'
4.  'Jazz'
5.  'Latin'
6.  'RnB'
7.  'International'
8.  'Country'
9.  'Reggae'
10. 'Blues'

In [19]:
# Count frequency of labels 1 to 10 in training dataset

# Almost as good as a histogram haha!

matrix_train_labels = train_labels.as_matrix()
#print(matrix_train_labels[:,1])

frequencies = []
for value in range(1, max(matrix_train_labels[:,1])+1):
  frequency = (matrix_train_labels[:,1]==value).sum()
  frequencies.append(frequency)
  
print(frequencies)


[2178, 618, 326, 253, 214, 260, 141, 195, 92, 86]


In [0]:
from random import sample

In [0]:
def split_data(data, percentage = 0.8):
  import numpy as np
  l = len(data) #length of data 
  f = round(percentage * l)  #number of elements you need
  indices = sample(range(l), f)
  
  train_data = data.iloc[indices]
  test_data = data.drop(data.index[indices])
  
  return train_data, test_data

In [0]:
local_train_df, local_test_df = split_data(data = train_df)

In [56]:
print(local_train_df.shape)
print(local_test_df.shape)

(3490, 265)
(873, 265)


## 3. Methods and experiments

*- Explain your whole approach (you can include a block diagram showing the steps in your process).* 

*- What methods/algorithms, why were the methods chosen. *

*- What evaluation methodology (cross CV, etc.).*



In [0]:
# Trying the logistic regression
import sklearn.linear_model as lm

In [22]:
lr = lm.LogisticRegression()
lr.fit(X=train_df, y=train_labels['gt'])

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [0]:
y_pred = lr.predict(test_df)

In [0]:
submission = np.vstack((np.array(range(1, len(y_pred)+1)), y_pred))
submission_df = pd.DataFrame(submission.transpose(), columns=['Sample_id', 'Sample_label'])
submission_df.to_csv('submission.csv', index=False)

In [0]:
#definition of function that creates submission file for accuracy competition
def create_submission_accuracy(y_pred, filename):
  submission = np.vstack((np.array(range(1, len(y_pred)+1)), y_pred))
  submission_df = pd.DataFrame(submission.transpose(), columns=['Sample_id', 'Sample_label'])
  submission_df.to_csv(filename, index=False)

In [27]:
#we should also try out feature selection which is different from dimensionality reduction
#dimensionality reduction creates new combinations of attributes while feature selection includes 
#only selected attributes without changing them

#feature selection with univariate selection
from sklearn.feature_selection import SelectKBest, SelectPercentile
from sklearn.feature_selection import chi2, f_classif

#chi2 test does not work as it only can handle positive values 
#therefore choose another score function

#test = SelectKBest(score_func=chi2, k=4)
#fit = test.fit(train_df, train_labels["gt"])
#features = fit.transform(train_df)

selector = SelectPercentile(f_classif, percentile=10)
selected_feature_dataset = selector.fit_transform(train_df, train_labels["gt"])
print("shape of selected dataset: ", selected_feature_dataset.shape)

#TODO: implement all these selection / dimensionality reduction algorithms as functions

shape of selected dataset:  (4363, 27)


  f = msb / msw


In [0]:
#feature selection with recursive feature elimination

In [0]:
#function definition of PCA to reduce dimensionality - should capture at least 95% of variance
def reduce_dimension(dataset):
  pca095 = PCA(n_components = 0.95)
  reduced_data = pca095.fit_transform(dataset)
  return reduced_data;

In [0]:
#function PCA to 15 features - was needed to transform the train_df also to 15 features as reduce_dimension's output
#for train_df was with 15 features
def reduce_dimension_15(dataset):
  pca15 = PCA(n_components = 15)
  reduced_data = pca15.fit_transform(dataset)
  return reduced_data;

In [31]:
#try logistic regression with reduced dimensionality
lr_red = lm.LogisticRegression()
lr_red.fit(X=reduce_dimension(train_df), y=train_labels['gt'])

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [0]:
y_pred_red = lr_red.predict(reduce_dimension_15(test_df))


In [0]:
submission = np.vstack((np.array(range(1, len(y_pred_red)+1)), y_pred_red))
submission_df = pd.DataFrame(submission.transpose(), columns=['Sample_id', 'Sample_label'])
submission_df.to_csv('submission_red_new.csv', index=False)

## 4. Results

*Summarize the results of the experiments without discussing their implications.*

*- Include both performance measures (accuracy and LogLoss).*

*- How does it perform on kaggle compared to the train data.*

*- Include a confusion matrix.*



In [0]:
#Confusion matrix ...

## 5. Discussion/Conclusions

*Interpret and explain your results *

*- Discuss the relevance of the performance measures (accuracy and LogLoss) for
imbalanced multiclass datasets. *

*- How the results relate to the literature. *

*- Suggestions for future research/improvement. *

*- Did the study answer your questions? *



## 6. References

*List of all the references cited in the document*

## Appendix
*Any additional material needed to complete the report can be included here. For example, if you want to keep  additional source code, additional images or plots, mathematical derivations, etc. The content should be relevant to the report and should help explain or visualize something mentioned earlier. **You can remove the whole Appendix section if there is no need for it.** *