# Final project guidelines

**Note:** Use these guidelines if and only if you are pursuing a **final project of your own design**. For those taking the final exam instead of the project, see the (separate) final exam notebook.

## Guidelines

These guidelines are intended for **undergraduates enrolled in INFO 3350**. If you are a graduate student enrolled in INFO 6350, you're welcome to consult the information below, but you have wider latitude to design and develop your project in line with your research goals.

### The task

Your task is to: identify an interesting problem connected to the humanities or humanistic social sciences that's addressable with the help of computational methods, formulate a hypothesis about it, devise an experiment or experiments to test your hypothesis, present the results of your investigations, and discuss your findings.

These tasks essentially replicate the process of writing an academic paper. You can think of your project as a paper in miniature.

You are free to present each of these tasks as you see fit. You should use narrative text (that is, your own writing in a markdown cell), citations of others' work, numerical results, tables of data, and static and/or interactive visualizations as appropriate. Total length is flexible and depends on the number of people involved in the work, as well as the specific balance you strike between the ambition of your question and the sophistication of your methods. But be aware that numbers never, ever speak for themselves. Quantitative results presented without substantial discussion will not earn high marks. 

Your project should reflect, at minimum, ten **or more** hours of work by each participant, though you will be graded on the quality of your work, not the amount of time it took you to produce it. Most high-quality projects represent twenty or more hours of work by each member.

#### Pick an important and interesting problem!

No amount of technical sophistication will overcome a fundamentally uninteresting problem at the core of your work. You have seen many pieces of successful computational humanities research over the course of the semester. You might use these as a guide to the kinds of problems that interest scholars in a range of humanities disciplines. You may also want to spend some time in the library, reading recent books and articles in the professional literature. **Problem selection and motivation are integral parts of the project.** Do not neglect them.

### Format

You should submit your project as a Jupyter notebook, along with all data necessary to reproduce your analysis. If your dataset is too large to share easily, let us know in advance so that we can find a workaround. If you have a reason to prefer a presentation format other than a notebook, likewise let us know so that we can discuss the options.

Your report should have four basic sections (provided in cells below for ease of reference):

1. **Introduction and hypothesis.** What problem are you working on? Why is it interesting and important? What have other people said about it? What do you expect to find?
2. **Corpus, data, and methods.** What data have you used? Where did it come from? How did you collect it? What are its limitations or omissions? What major methods will you use to analyze it? Why are those methods the appropriate ones?
3. **Results.** What did you find? How did you find it? How should we read your figures? Be sure to include confidence intervals or other measures of statistical significance or uncetainty where appropriate.
4. **Discussion and conclusions.** What does it all mean? Do your results support your hypothesis? Why or why not? What are the limitations of your study and how might those limitations be addressed in future work?

Within each of those sections, you may use as many code and markdown cells as you like. You may, of course, address additional questions or issues not listed above.

All code used in the project should be present in the notebook (except for widely-available libraries that you import), but **be sure that we can read and understand your report in full without rerunning the code**. Be sure, too, to explain what you're doing along the way, both by describing your data and methods and by writing clean, well commented code.

### Grading

This project takes the place of the take-home final exam for the course. It is worth 35% of your overall grade. You will be graded on the quality and ambition of each aspect of the project. No single component is more important than the others.

### Practical details

* The project is due at **noon on Saturday, December 9** via upload to CMS of a single zip file containing your fully executed Jupyter notebook and all associated data.
* You may work alone or in a group of up to three total members.
    * If you work in a group, be sure to list the names of the group members.
    * For groups, create your group on CMS and submit one notebook for the entire group. **Each group should also submit a statement of responsibility** that describes in general terms who performed which parts of the project.
* You may post questions on Ed, but should do so privately (visible to course staff only).
* Interactive visualizations do not always work when embedded in shared notebooks. If you plan to use interactives, you may need to host them elsewhere and link to them.

---

## Your info
* NetID(s):
* Name(s):
---

## 1. Introduction and hypothesis

## 2. Data and methods

## 3. Results

## 4. Discussion and conclusions

# Appendix

In [1]:
import pandas as pd
import numpy as np

In [2]:
spotify_csv = pd.read_csv("taylor_swift_data-main/Taylor_Swift_Spotify/taylor_swift_spotify_data.csv", index_col = "Playlist ID")
spotify_csv.head(8)

Unnamed: 0_level_0,URI,Album,Song Name,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Duration_ms,Time Signature
Playlist ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1.0,0Om9WAB5RS09L80DyOfTNa,Taylor Swift,Tim McGraw,0.58,0.491,0.0,-6.462,1.0,0.0251,0.575,0.0,0.121,0.425,76.009,232107.0,4.0
2.0,32mVHdy0bi1XKgr0ajsBlG,Taylor Swift,Picture To Burn,0.658,0.877,7.0,-2.098,1.0,0.0323,0.173,0.0,0.0962,0.821,105.586,173067.0,4.0
3.0,7zMcNqs55Mxer82bvZFkpg,Taylor Swift,Teardrops On My Guitar - Radio Single Remix,0.621,0.417,10.0,-6.941,1.0,0.0231,0.288,0.0,0.119,0.289,99.953,203040.0,4.0
4.0,73OX8GdpOeGzKC6OvGSbsv,Taylor Swift,A Place in this World,0.576,0.777,9.0,-2.881,1.0,0.0324,0.051,0.0,0.32,0.428,115.028,199200.0,4.0
5.0,7an1exwMnfYRcdVQm0yDev,Taylor Swift,Cold As You,0.418,0.482,5.0,-5.769,1.0,0.0266,0.217,0.0,0.123,0.261,175.558,239013.0,4.0
6.0,2QA3IixpRcKyOdG7XDzRgv,Taylor Swift,The Outside,0.589,0.805,5.0,-4.055,1.0,0.0293,0.00491,0.0,0.24,0.591,112.982,207107.0,4.0
7.0,6K0CJLVXqbGMeJSmJ4ENKK,Taylor Swift,Tied Together with a Smile,0.479,0.578,2.0,-4.963,1.0,0.0294,0.525,0.0,0.0841,0.192,146.165,248107.0,4.0
8.0,2ZoOmCSgj0ypVAmGd1ve4y,Taylor Swift,Stay Beautiful,0.594,0.629,8.0,-4.919,1.0,0.0246,0.0868,0.0,0.137,0.504,131.597,236053.0,4.0


In [3]:
spotify_csv.columns

Index(['URI', 'Album', 'Song Name', 'Danceability', 'Energy', 'Key',
       'Loudness', 'Mode', 'Speechiness', 'Acousticness', 'Instrumentalness',
       'Liveness', 'Valence', 'Tempo', 'Duration_ms', 'Time Signature'],
      dtype='object')

In [4]:
song_lyrics_csv = pd.read_csv("taylor_swift_data-main/Taylor_Swift_Genius/taylor_swift_genius_data.csv", index_col = "index")
song_lyrics_csv.head(8)

Unnamed: 0_level_0,Album,Song Name,Lyrics
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Taylor Swift,Mary's Song (Oh My My My),She said I was seven and you were nine I looke...
1,Taylor Swift,A Perfectly Good Heart,Why would you wanna break A perfectly good hea...
2,Taylor Swift,Tim McGraw,He said the way my blue eyes shined Put those ...
3,Taylor Swift,Teardrops On My Guitar,Drew looks at me I fake a smile so he won't se...
4,Taylor Swift,Cold as You,You have a way of coming easily to me And when...
5,Taylor Swift,The Outside,I didn't know what I would find When I went lo...
6,Taylor Swift,Should've Said No,It's strange to think the songs we used to sin...
7,Taylor Swift,A Place In This World,"I don't know what I want, so don't ask me 'Cau..."


Looking at how we can parse the data and look at a particular lyrics.

In [5]:
song_lyrics_csv[song_lyrics_csv['Album']=='Taylor Swift']['Lyrics'][1]

"Why would you wanna break A perfectly good heart? Why would you wanna take Our love and tear it all apart now? Why would you wanna make The very first scar? Why would you wanna break A perfectly good heart?  Maybe I should've seen the signs Should've read the writing on the wall And realized by the distance in your eyes That I would be the one to fall No matter what you say I still can't believe that you would walk away It don't make sense to me, but  Why would you wanna break A perfectly good heart? Why would you wanna take Our love and tear it all apart now? Why would you wanna make The very first scar? Why would you wanna break A perfectly good heart?  It's not unbroken anymore (It's not unbroken anymore) How do I get it back the way it was before?  Why would you wanna break A perfectly good heart? Why would you wanna take Our love and tear it all apart now? Why would you wanna make The very first scar? Why would you wanna break— (Why) Would you wanna break it? You might also like 

In [6]:
full_dataframe = song_lyrics_csv.merge(spotify_csv, how='inner', on=['Song Name', 'Album'])
full_dataframe.head(3)

Unnamed: 0,Album,Song Name,Lyrics,URI,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Duration_ms,Time Signature
0,Taylor Swift,Mary's Song (Oh My My My),She said I was seven and you were nine I looke...,2QrQCMel6v2JiLxqrg4p2O,0.403,0.627,2.0,-5.28,1.0,0.0292,0.0177,0.0,0.182,0.374,74.9,213080.0,4.0
1,Taylor Swift,A Perfectly Good Heart,Why would you wanna break A perfectly good hea...,1spLfUJxtyVyiKKTegQ2r4,0.483,0.751,4.0,-5.726,1.0,0.0365,0.00349,0.0,0.128,0.268,156.092,220147.0,4.0
2,Taylor Swift,Tim McGraw,He said the way my blue eyes shined Put those ...,0Om9WAB5RS09L80DyOfTNa,0.58,0.491,0.0,-6.462,1.0,0.0251,0.575,0.0,0.121,0.425,76.009,232107.0,4.0


In [12]:
len(full_dataframe['Lyrics'][1])

1214

In [9]:
values = []
for i in range(0, len(full_dataframe['Lyrics'])):
    values.append(len(full_dataframe['Lyrics'][i]))
full_dataframe['Lyrics Length'] = values

In [16]:
full_dataframe.head(4)

Unnamed: 0,Album,Song Name,Lyrics,URI,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Duration_ms,Time Signature,Lyrics Length
0,Taylor Swift,Mary's Song (Oh My My My),She said I was seven and you were nine I looke...,2QrQCMel6v2JiLxqrg4p2O,0.403,0.627,2.0,-5.28,1.0,0.0292,0.0177,0.0,0.182,0.374,74.9,213080.0,4.0,1602
1,Taylor Swift,A Perfectly Good Heart,Why would you wanna break A perfectly good hea...,1spLfUJxtyVyiKKTegQ2r4,0.483,0.751,4.0,-5.726,1.0,0.0365,0.00349,0.0,0.128,0.268,156.092,220147.0,4.0,1214
2,Taylor Swift,Tim McGraw,He said the way my blue eyes shined Put those ...,0Om9WAB5RS09L80DyOfTNa,0.58,0.491,0.0,-6.462,1.0,0.0251,0.575,0.0,0.121,0.425,76.009,232107.0,4.0,1801
3,Taylor Swift,The Outside,I didn't know what I would find When I went lo...,2QA3IixpRcKyOdG7XDzRgv,0.589,0.805,5.0,-4.055,1.0,0.0293,0.00491,0.0,0.24,0.591,112.982,207107.0,4.0,1114


In [18]:
full_dataframe = full_dataframe.drop(columns = ['URI'])

In [20]:
full_dataframe.head(5)

Unnamed: 0,Album,Song Name,Lyrics,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Duration_ms,Time Signature,Lyrics Length
0,Taylor Swift,Mary's Song (Oh My My My),She said I was seven and you were nine I looke...,0.403,0.627,2.0,-5.28,1.0,0.0292,0.0177,0.0,0.182,0.374,74.9,213080.0,4.0,1602
1,Taylor Swift,A Perfectly Good Heart,Why would you wanna break A perfectly good hea...,0.483,0.751,4.0,-5.726,1.0,0.0365,0.00349,0.0,0.128,0.268,156.092,220147.0,4.0,1214
2,Taylor Swift,Tim McGraw,He said the way my blue eyes shined Put those ...,0.58,0.491,0.0,-6.462,1.0,0.0251,0.575,0.0,0.121,0.425,76.009,232107.0,4.0,1801
3,Taylor Swift,The Outside,I didn't know what I would find When I went lo...,0.589,0.805,5.0,-4.055,1.0,0.0293,0.00491,0.0,0.24,0.591,112.982,207107.0,4.0,1114
4,Taylor Swift,Should've Said No,It's strange to think the songs we used to sin...,0.476,0.777,4.0,-3.771,0.0,0.0289,0.0103,0.0,0.196,0.472,167.964,242200.0,4.0,1801
