# Tidy Tuesday Project for September 17th, 2024

Hello there, folks!  I am coming at you with my first #tidytuesday project.  I am currently learning the [Julia](https://julialang.org/) programming language, and so that is what I am going to use in this notebook.

Please feel free to critique me if you are a Julia programmer!

## The Shakespeare Dialogue Dataset

Thanks to [nrennie](https://github.com/nrennie), we have access to a dataset that you can find --> [here](https://github.com/nrennie/shakespeare).

The author of the dataset we are using webscraped the data from [here](https://shakespeare.mit.edu/).

Let's get to it!

### Setup

In [None]:
# import Pkg and then other required packages
using Pkg
Pkg.add(["CSV", "DataFrames", "HTTP", "Statistics", "StatsPlots", "Plots"])

# load
using CSV, DataFrames, HTTP, Statistics, StatsPlots, Plots

### Import data

In [None]:
# Read directly from GitHub
hamlet_url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-09-17/hamlet.csv"
macbeth_url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-09-17/macbeth.csv"
romeo_juliet_url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-09-17/romeo_juliet.csv"

# Create DataFrames
hamlet = CSV.read(HTTP.get(hamlet_url).body, DataFrame)
macbeth = CSV.read(HTTP.get(macbeth_url).body, DataFrame)
romeo_juliet = CSV.read(HTTP.get(romeo_juliet_url).body, DataFrame);


### Hamlet 📚

In [None]:
# Basic Description
describe(hamlet)

In [None]:
# Number of unique characters
unique_characters = unique(hamlet.character)
println("Number of unique characters: ", length(unique_characters))

In [None]:
# Most frequent characters
character_counts = combine(groupby(hamlet, :character), nrow => :Count)
sorted_counts = sort(character_counts, :Count, rev=true)
println("Most frequent characters:\n", sorted_counts[1:10, :character, :Count])

In [None]:
# Distribution of dialogue lengths
hamlet.dialogue_length = length.(hamlet.dialogue)

histogram(hamlet.dialogue_length, 
bins=20, 
title="Distribution of Dialogue Lengths in Hamlet", 
xlabel="Dialogue Length", 
ylabel="Count"
)

In [None]:
# Plot line number distribution by act and scene
scatter(hamlet.line_number, 
hamlet.act, 
group=hamlet.scene, 
legend=:topright, 
title="Line Number Distribution by Act and Scene", 
xlabel="Line Number", 
ylabel="Act"
)

### Macbeth 🗡️

In [None]:
# Basic Description
describe(macbeth)

In [None]:
# Number of unique characters
unique_characters_macbeth = unique(macbeth.character)
println("Number of unique characters: ", length(unique_characters_macbeth))

In [None]:
# Most frequent characters
character_counts_macbeth = combine(groupby(macbeth, :character), nrow => :Count)
sorted_counts_macbeth = sort(character_counts_macbeth, :Count, rev=true)
println("Most frequent characters:\n", sorted_counts_macbeth[1:10, :character, :Count])

In [None]:
# Distribution of dialogue lengths
macbeth.dialogue_length = length.(macbeth.dialogue)
histogram(macbeth.dialogue_length, 
bins=20, 
title="Distribution of Dialogue Lengths in Macbeth", 
xlabel="Dialogue Length", 
ylabel="Count"
)

In [None]:
# Plot line number distribution by act and scene
scatter(macbeth.line_number, 
macbeth.act, 
group=macbeth.scene, 
legend=:topright, 
title="Line Number Distribution by Act and Scene", 
xlabel="Line Number", 
ylabel="Act"
)

### Romeo & Juliet ❤️

In [None]:
# Basic Description
describe(romeo_juliet)

In [None]:
# Number of unique characters
unique_characters_rj = unique(romeo_juliet.character)
println("Number of unique characters: ", length(unique_characters_rj))

In [None]:
# Most frequent characters
character_counts_rj = combine(groupby(romeo_juliet, :character), nrow => :Count)
sorted_counts_rj = sort(character_counts_rj, :Count, rev=true)
println("Most frequent characters:\n", sorted_counts_rj[1:10, :character, :Count])

In [None]:
# Distribution of dialogue lengths
romeo_juliet.dialogue_length = length.(romeo_juliet.dialogue)
histogram(romeo_juliet.dialogue_length, 
bins=20, 
title="Distribution of Dialogue Lengths in Romeo & Juliet", 
xlabel="Dialogue Length", 
ylabel="Count"
)

In [None]:
# Plot line number distribution by act and scene
scatter(romeo_juliet.line_number, 
romeo_juliet.act, 
group=romeo_juliet.scene, 
legend=:topright, 
title="Line Number Distribution by Act and Scene", 
xlabel="Line Number", 
ylabel="Act"
)