# Milestone 2: Data collection, preprocessing, exploratory analysis
### On Tonal ambiguity and harmonic structure in Debussy’s piano music
#### By Ludovica Schaerf, Sabrina Laneve , Yuanhui Lin, Raphael Levier.

The notebook is articulated as follows:
1. [Gathering the data](#Gathering-the-data)
2. [Data format](#Data-format)
3. [Exploratory analysis](#Exploratory-analysis)

### Gathering the data

Here you can show that you understand your data and know how to use it. You can be brief in your answers. Section 1 may be supported with informative plots.
-	Describe the dataset you selected and the information represented in it.
-	Based on your research question(s), why did you select this dataset?
-	What aspects of the researched phenomena does the data (not) represent?
-	Where did you get your data from?
-	How did you get it?
-	What is the maximum available amount in theory (in the case of incomplete data acquisition)?
-   If working on partial data, how representative is your sample for the full dataset?


### Data format

This section and the following are the main reason why this milestone is to be delivered in a Jupyter notebook: Give insightful examples for each one of the following questions by loading and transforming data samples.
-	What format(s) does the raw data come in?
-	How is the information that the dataset represents encoded in this format?
-	Load your dataset and show examples of how you access the information that you are interested in.
-	Give an overview of your dataset by plotting some basic statistics of the relevant features and/or metadata.


### Exploratory analysis

Perform an exploratory analysis on your data. In this section of your report please address the following points:
-	Which analyses or experiments did you perform as part of your exploratory analysis of the data and what are the results? Choose suitable visualizations to show your results.
-	How do the results relate to your hypotheses?
-	Did you find interesting or unexpected things? How do they influence the development of your project?
-	What are problems that you encountered? How could you plan to deal with them?
-	Discuss how the data enables you to answer your research question(s).
-	Formulate educated guesses on your outcomes based on this data.
-	Reflect on how, at the end of the analysis, you will be able to tell whether these outcomes have manifested or not and how confident you will possibly be about this assessment.



In [None]:
from wavescapes import *
from matplotlib import pyplot as plt

# transforms the MIDI Files into a list of pitch class distribution, each corresponding to a slice of one quarter note from the file.
pc_mat = produce_pitch_class_matrix_from_filename(filepath = 'etude_01_(c)lefeldt.mid', aw_size = 0.6)

# the DFT is applied to each of the pitch class distribution
fourier_mat = apply_dft_to_pitch_class_matrix(pc_mat)

# only the third Fourier coefficient is kept from the previous result and the matrix holding all color coded measurement is built
coeff_mat = complex_utm_to_ws_utm(fourier_mat, coeff=3)

# an instance of a class that allows the drawing of the previous matrix of colors is produced with the resolution being indicated as 500 pixels in width.
ws = Wavescape(coeff_mat, pixel_width=500)

# this draw the plot as an matplotlib figure. If called on a noteboot, this will display the figure at the end of the cell.
ws.draw(tick_ratio=4)

# saves the figure drawn as PNG image.
plt.savefig('3rd_coeff_wavescape.png')