# A sentiment(al) analysis of why Red Dwarf is no longer funny (to me)
## Part II: How can the same smeg happen to the same show twice?

What follows is a continuation of the (faux) analysis of the humour in Red Dwarf and why it seems to be waning in recent series. The first part is constitutes a blog post and Jupyter notebook which can be found [here](http://ian.bebbs.co.uk/posts/ASentimentalAnalysisOfRedDwarf) and [here](https://github.com/ibebbs/RedDwarfAnalysis/blob/master/Investigation.ipynb) respectively .

It uses the following packages:

In [1]:
#load "Paket.fsx"

Paket.Package [ "FsLab" ]

In [2]:
#r "packages/FSharp.Data/lib/net40/FSharp.Data.dll"
#r "packages/Google.DataTable.Net.Wrapper/lib/Google.DataTable.Net.Wrapper.dll"
#r "packages/XPlot.GoogleCharts/lib/net45/XPlot.GoogleCharts.dll"
#r "packages/MathNet.Numerics/lib/net40/MathNet.Numerics.dll"
#r "packages/MathNet.Numerics.FSharp/lib/net40/MathNet.Numerics.FSharp.dll"

open System
open System.IO
open System.Text.RegularExpressions
open FSharp.Data
open FSharp.Data.JsonExtensions
open XPlot
open XPlot.GoogleCharts
open MathNet.Numerics.Statistics

And, despite my commit of the display printer below being [merged into the IfSharp package](https://github.com/fsprojects/IfSharp/issues/118#issuecomment-287387603) it seems it is still required. 

In [6]:
open IfSharp.Kernel.App

@"<script src=""https://www.google.com/jsapi""></script>" |> Util.Html |> Display

type XPlot.GoogleCharts.GoogleChart with
  member __.GetContentHtml() =
    let html = __.GetInlineHtml()
    html
      .Replace ("google.setOnLoadCallback(drawChart);", "google.load('visualization', '1.0', { packages: ['corechart'], callback: drawChart })")

type XPlot.GoogleCharts.Chart with
  static member Content (chart : GoogleChart) =
    { ContentType = "text/html"; Data = chart.GetContentHtml() }

AddDisplayPrinter (fun (plot: XPlot.GoogleCharts.GoogleChart) -> { ContentType = "text/html"; Data = plot.GetContentHtml() })

## Observation
In the previous analysis, I posited that several of the latter series of Red Dwarf were no longer as funny as the firsth few series. I decided to use viewer ratings from various sites (but mainly IMDB) to see if most people agreed with this argument or if I had simply lost my ability to appreciate the humour.

By the end of the analysis I had found that, while there was a general deterioration in the viewer ratings for later series of Red Dwarf, the deterioration was nowhere near as pronounced as I imagined and therefore concluded that I must simply be a **miserable old git**. However, I also noted that [sentiment analysis](https://en.wikipedia.org/wiki/Sentiment_analysis) might provide further in sight into the deterioration of viewer ratings and, specifically, my impression of the later series.

This analysis endeavours to employ this approach to see if the general sentiment expressed within episodes of the show and/or by specific characters has changed and, if so, what effect this has had on viewer rating.

## Information
In order to employ semantic analysis, we need to text of content of each episode, ideally attributed to each character. Fortunately, as Red Dwarf has such a dedicated fan base, it wasn't difficult to find 'transcripts' for each [episode of the show](http://www.ladyofthecake.com/reddwarf/html/scripts.html).

These were downloaded to a 'Transcripts' directory and the index of each episode updated to include the name of the transcript for the episode, as shown below:

In [23]:
type TranscriptFormat =
| SameLine
| NextLine
| NextLineDoubleSpaced

type EpisodeSource = {
    Id : string
    Transcript : string option
    Format : TranscriptFormat
    Season : int
    Episode : int
}

let episodeSources = [
    { Id = "tt0684181"; Transcript = Some "Theend.txt"; Format = SameLine; Season = 1; Episode = 1 };
    { Id = "tt0684157"; Transcript = Some "Futureec.txt"; Format = SameLine; Season = 1; Episode = 2 };
    { Id = "tt0684145"; Transcript = Some "Balanceo.txt"; Format = SameLine; Season = 1; Episode = 3 };
    { Id = "tt0684186"; Transcript = Some "Waitingf.txt"; Format = SameLine; Season = 1; Episode = 4 };
    { Id = "tt0684151"; Transcript = Some "Confiden.txt"; Format = SameLine; Season = 1; Episode = 5 };
    { Id = "tt0684165"; Transcript = Some "Me2.txt"; Format = SameLine; Season = 1; Episode = 6 };
    { Id = "tt0684161"; Transcript = Some "Kryten.txt"; Format = SameLine; Season = 2; Episode = 1 };
    { Id = "tt0684146"; Transcript = Some "Betterth.txt"; Format = SameLine; Season = 2; Episode = 2 };
    { Id = "tt0684180"; Transcript = Some "Thanksfo.txt"; Format = SameLine; Season = 2; Episode = 3 };
    { Id = "tt0684177"; Transcript = Some "Stasisle.txt"; Format = SameLine; Season = 2; Episode = 4 };
    { Id = "tt0684175"; Transcript = Some "Queeg.txt"; Format = SameLine; Season = 2; Episode = 5 };
    { Id = "tt0684169"; Transcript = Some "Paralle.txt"; Format = SameLine; Season = 2; Episode = 6 };
    { Id = "tt0684144"; Transcript = Some "Backward.txt"; Format = SameLine; Season = 3; Episode = 1 };
    { Id = "tt0767232"; Transcript = Some "Marooned.txt"; Format = SameLine; Season = 3; Episode = 2 };
    { Id = "tt0684172"; Transcript = Some "Polymorp.txt"; Format = SameLine; Season = 3; Episode = 3 };
    { Id = "tt0684148"; Transcript = Some "Bodyswap.txt"; Format = SameLine; Season = 3; Episode = 4 };
    { Id = "tt0684185"; Transcript = Some "Timeslid.txt"; Format = SameLine; Season = 3; Episode = 5 };
    { Id = "tt0684183"; Transcript = Some "Thelastd.txt"; Format = SameLine; Season = 3; Episode = 6 };
    { Id = "tt0684149"; Transcript = Some "Camille.txt"; Format = SameLine; Season = 4; Episode = 1 };
    { Id = "tt0684152"; Transcript = Some "Dna.txt"; Format = SameLine; Season = 4; Episode = 2 };
    { Id = "tt0684160"; Transcript = Some "Justice.txt"; Format = SameLine; Season = 4; Episode = 3 };
    { Id = "tt0684187"; Transcript = Some "Whitehol.txt"; Format = SameLine; Season = 4; Episode = 4 };
    { Id = "tt0684153"; Transcript = Some "Dimensio.txt"; Format = SameLine; Season = 4; Episode = 5 };
    { Id = "tt0684164"; Transcript = Some "Meltdown.txt"; Format = SameLine; Season = 4; Episode = 6 };
    { Id = "tt0684159"; Transcript = Some "Holoship.txt"; Format = SameLine; Season = 5; Episode = 1 };
    { Id = "tt0684182"; Transcript = Some "Inquisit.txt"; Format = SameLine; Season = 5; Episode = 2 };
    { Id = "tt0684179"; Transcript = Some "Terrorfo.txt"; Format = SameLine; Season = 5; Episode = 3 };
    { Id = "tt0684174"; Transcript = Some "Quarinti.txt"; Format = SameLine; Season = 5; Episode = 4 };
    { Id = "tt0756588"; Transcript = Some "Demonsan.txt"; Format = SameLine; Season = 5; Episode = 5 };
    { Id = "tt0684143"; Transcript = Some "Backtore.txt"; Format = SameLine; Season = 5; Episode = 6 };
    { Id = "tt0684173"; Transcript = Some "Psirens.txt"; Format = SameLine; Season = 6; Episode = 1 };
    { Id = "tt0684163"; Transcript = Some "Legion.txt"; Format = SameLine; Season = 6; Episode = 2 };
    { Id = "tt0684158"; Transcript = Some "Gunmen.txt"; Format = SameLine; Season = 6; Episode = 3 };
    { Id = "tt0684155"; Transcript = Some "Emohawk.txt"; Format = SameLine; Season = 6; Episode = 4 };
    { Id = "tt0684176"; Transcript = Some "Rimmerwo.txt"; Format = SameLine; Season = 6; Episode = 5 };
    { Id = "tt0756589"; Transcript = Some "Outoftim.txt"; Format = SameLine; Season = 6; Episode = 6 };
    { Id = "tt0684184"; Transcript = Some "tikka.txt"; Format = NextLine; Season = 7; Episode = 1 };
    { Id = "tt0684178"; Transcript = Some "stoak.txt"; Format = NextLineDoubleSpaced; Season = 7; Episode = 2 };
    { Id = "tt0684168"; Transcript = Some "ouroboros.txt"; Format = NextLine; Season = 7; Episode = 3 };
    { Id = "tt0684154"; Transcript = Some "ductsoup.txt"; Format = NextLine; Season = 7; Episode = 4 };
    { Id = "tt0756587"; Transcript = Some "blue.txt"; Format = NextLineDoubleSpaced; Season = 7; Episode = 5 };
    { Id = "tt0684147"; Transcript = Some "beyond.txt"; Format = NextLineDoubleSpaced; Season = 7; Episode = 6 };
    { Id = "tt0684156"; Transcript = Some "epideme.txt"; Format = NextLineDoubleSpaced; Season = 7; Episode = 7 };
    { Id = "tt0684166"; Transcript = Some "nanarchy.txt"; Format = NextLineDoubleSpaced; Season = 7; Episode = 8 };
    { Id = "tt0684140"; Transcript = Some "bir1.txt"; Format = NextLine; Season = 8; Episode = 1 };
    { Id = "tt0684141"; Transcript = Some "bir2.txt"; Format = NextLine; Season = 8; Episode = 2 };
    { Id = "tt0684142"; Transcript = None; Format = NextLine; Season = 8; Episode = 3 };
    { Id = "tt0684150"; Transcript = Some "cassandra.txt"; Format = NextLine; Season = 8; Episode = 4 };
    { Id = "tt0684162"; Transcript = Some "krytietv.txt"; Format = NextLine; Season = 8; Episode = 5 };
    { Id = "tt0684170"; Transcript = Some "pete1.txt"; Format = NextLine; Season = 8; Episode = 6 };
    { Id = "tt0684171"; Transcript = Some "pete2.txt"; Format = NextLine; Season = 8; Episode = 7 };
    { Id = "tt0684167"; Transcript = Some "otg.txt"; Format = NextLine; Season = 8; Episode = 8 };
    { Id = "tt1365540"; Transcript = None; Format = NextLine; Season = 9; Episode = 1 };
    { Id = "tt1371606"; Transcript = None; Format = NextLine; Season = 9; Episode = 2 };
    { Id = "tt1400975"; Transcript = None; Format = NextLine; Season = 9; Episode = 3 };
    { Id = "tt1997038"; Transcript = None; Format = NextLine; Season = 10; Episode = 1 };
    { Id = "tt1999714"; Transcript = None; Format = NextLine; Season = 10; Episode = 2 };
    { Id = "tt1999715"; Transcript = None; Format = NextLine; Season = 10; Episode = 3 };
    { Id = "tt1999716"; Transcript = None; Format = NextLine; Season = 10; Episode = 4 };
    { Id = "tt1999717"; Transcript = None; Format = NextLine; Season = 10; Episode = 5 };
    { Id = "tt1999718"; Transcript = None; Format = NextLine; Season = 10; Episode = 6 };
    { Id = "tt5218244"; Transcript = None; Format = NextLine; Season = 11; Episode = 1 };
    { Id = "tt5218254"; Transcript = None; Format = NextLine; Season = 11; Episode = 2 };
    { Id = "tt5218266"; Transcript = None; Format = NextLine; Season = 11; Episode = 3 };
    { Id = "tt5218284"; Transcript = None; Format = NextLine; Season = 11; Episode = 4 };
    { Id = "tt5218308"; Transcript = None; Format = NextLine; Season = 11; Episode = 5 };
    { Id = "tt5218316"; Transcript = None; Format = NextLine; Season = 11; Episode = 6 }
]

These transcripts were then parsed to remove scene description, production notes, accentuations and other irrelevant information and to attribute each of the remaining lines to one of the five main characters of Red Dwarf: 

In [8]:
type Character =
| Lister
| Rimmer
| Cat
| Kryten
| Holly

let characterName character =
    match character with
    | Lister -> "LISTER"
    | Rimmer -> "RIMMER"
    | Cat -> "CAT"
    | Kryten -> "KRYTEN"
    | Holly -> "HOLLY"

This presented several challenges as the transcripts were in a number of formats (text on the same line as the character name, text on the line after the character name, text on the line after the character name but each line double spaced).

To first format "SameLine" looks as follows:

```
RIMMER: Is that a cigarette you're smoking, Lister?
LISTER: No, it's a chicken.
RIMMER: Right!  You're on report.  Two times in as many minutes, Lister!
  I don't know.
```

This format contains speech on the same line as the character name with extended speech wrapped onto subsequent lines and indented by two spaces. To read this format, a ```SameLineScanner``` type was defined that joins lines of speech such that it could be easily used by the ```Seq.scan``` function a successfully parse a sequence of strings representing text from the episode. The ```SameLineScanner``` is shown below:

In [9]:
type SameLineScanner = {
    LineBuilder : string
    Lines : string seq
} with
    static member Empty = { LineBuilder = ""; Lines = [] }
    static member Scan scanner (line : string) =
        match line.StartsWith("  ") with
        | true -> 
            let result = sprintf "%s %s" scanner.LineBuilder (line.Trim())
            { scanner with LineBuilder = result; Lines = [] }
        | false -> 
            { scanner with LineBuilder = line; Lines = [ scanner.LineBuilder ]}

let parseSameLine (lines : string seq) =
    lines
    |> Seq.append (Seq.singleton "")
    |> Seq.scan SameLineScanner.Scan SameLineScanner.Empty
    |> Seq.collect (fun s -> s.Lines )
    |> Seq.map (fun l -> Regex.Replace(l, "\(.*?\)", "").Replace("_", "").Replace("*", "").Replace("  ", " ").Replace('"', '\'').Trim())
    |> Seq.where (fun l -> not (String.IsNullOrWhiteSpace(l)))

The other two formats - 'NextLine' and 'NextLineDoubleSpaced' - look like this:

```
RIMMER
  "Dear Mister Lister, your appeal has been successful"!
  "From this day forth all inmates with no record of violence or depression
will be allowed... to have strings on their guitars"...
  This appeal was all about guitar strings?

LISTER
  You didn't think it was about getting out of here, did you?

RIMMER
  You mean to say I've been busting my balls so you can have strings on your
lousy, stinking guitar??

LISTER
  You've been a brick, man. And as a personal 'thank you', I thought I'd
write you a song...
```

With 'NextLineDoubleSpaced' being just that - an additional empty line between each line of the transcript. In order to process these transcripts a similar approach was employed as with the 'SameLine' format, namely a ```NextLineScanner``` was written that used character names and blank lines to delimit the speech of each character:

In [10]:
type NextLineScanner = {
    ReadingCharacter : Character option
    Spacing : int
    CurrentSpacing : int
    LineBuilder : string
    Lines : string seq
} with
    static member For spacing = { ReadingCharacter = None; Spacing = spacing; CurrentSpacing = 0; LineBuilder = ""; Lines = [] }
    static member Scan scanner (line : string) =
        match (line.ToUpper()) with
        | "LISTER" -> { scanner with ReadingCharacter = Some Lister; CurrentSpacing = 0; LineBuilder = ""; Lines = [ ] }
        | "RIMMER" -> { scanner with ReadingCharacter = Some Rimmer; CurrentSpacing = 0; LineBuilder = ""; Lines = [ ] }
        | "CAT" -> { scanner with ReadingCharacter = Some Cat; CurrentSpacing = 0; LineBuilder = ""; Lines = [ ] }
        | "KRYTEN" -> { scanner with ReadingCharacter = Some Kryten; CurrentSpacing = 0; LineBuilder = ""; Lines = [ ] }
        | "HOLLY" -> { scanner with ReadingCharacter = Some Holly; CurrentSpacing = 0; LineBuilder = ""; Lines = [ ] }
        | _ ->
            match scanner.ReadingCharacter, String.IsNullOrWhiteSpace(line) with
            | Some character, false -> { scanner with LineBuilder = (sprintf "%s %s" scanner.LineBuilder (line.Trim())); CurrentSpacing = 0; Lines = [] }
            | Some character, true when scanner.CurrentSpacing < scanner.Spacing ->  { scanner with CurrentSpacing = scanner.CurrentSpacing + 1; Lines = [] }
            | Some character, true -> { scanner with ReadingCharacter = None; CurrentSpacing = 0; LineBuilder = ""; Lines = [ (sprintf "%s: %s" (characterName character) scanner.LineBuilder) ] }
            | _, _ -> { scanner with ReadingCharacter = None; LineBuilder = ""; Lines = [ ] }

let parseNextLine (lines : string seq) spacing =
    lines
    |> Seq.append (Seq.singleton "")
    |> Seq.scan NextLineScanner.Scan (NextLineScanner.For spacing)
    |> Seq.collect (fun s -> s.Lines )
    |> Seq.map (fun l -> Regex.Replace(l, "\(.*?\)", "").Replace("_", "").Replace("*", "").Replace("  ", " ").Replace('"', '\'').Trim())
    |> Seq.where (fun l -> not (String.IsNullOrWhiteSpace(l)))

These two 'scanners' were then used to parse valid lines from all episodes into a common (SameLine) format:

In [12]:
let transcripts = "https://github.com/ibebbs/RedDwarfAnalysis/tree/master/Transcript"

let episodeLines =
    episodeSources
    |> Seq.where (fun s -> s.Transcript.IsSome)
    |> Seq.map (fun s -> (s.Season, s.Episode, s.Format, (File.ReadAllLines(Path.Combine(transcripts, s.Transcript.Value)))))
    |> Seq.collect (fun (season, episode, format, lines) -> 
        let parsedLines =
            match format with
            | SameLine -> parseSameLine lines
            | NextLine -> parseNextLine lines 0
            | NextLineDoubleSpaced -> parseNextLine lines 1
        parsedLines
        |> Seq.map (fun line -> (season, episode, line)))

[Active patterns](https://docs.microsoft.com/en-us/dotnet/articles/fsharp/language-reference/active-patterns) were then used to associate each line with a character:

In [13]:
let (|IsLister|_|) (line : string) =
    match line.ToUpper().StartsWith("LISTER:") with
    | true when line.Length > 8 -> Some (line.Substring(8))
    | _ -> None
    
let (|IsKryten|_|) (line : string) =
    match line.ToUpper().StartsWith("KRYTEN:") with
    | true when line.Length > 8 -> Some (line.Substring(8))
    | _ -> None

let (|IsRimmer|_|) (line : string) =
    match line.ToUpper().StartsWith("RIMMER:") with
    | true when line.Length > 8 -> Some (line.Substring(8))
    | _ -> None

let (|IsCat|_|) (line : string) =
    match line.ToUpper().StartsWith("CAT:") with
    | true when line.Length > 5 -> Some (line.Substring(5))
    | _ -> None

let (|IsHolly|_|) (line : string) =
    match line.ToUpper().StartsWith("HOLLY:") with
    | true when line.Length > 7 -> Some (line.Substring(7))
    | _ -> None

let characterLine line =
    match line with
    | IsLister line -> Some (Lister, line)
    | IsRimmer line -> Some (Rimmer, line)
    | IsCat line -> Some (Cat, line)
    | IsKryten line -> Some (Kryten, line)
    | IsHolly line -> Some (Holly, line)
    | _ -> None

let characterLines = 
    episodeLines
    |> Seq.choose (fun (season, episode, line) ->
        let result = characterLine line
        match result with
        | Some (character, line) -> Some (season, episode, character, line)
        | None -> None)

For each line that was successfully associated with a character, a call was made to CoreNLP (hosted in a local container) to calculate sentiment for the line:

In [17]:
type SentimentResponse = JsonProvider<"https://raw.githubusercontent.com/ibebbs/RedDwarfAnalysis/master/SentimentResponse.json">

let lineSentiment (line : string) =    
    printfn "Getting sentiment for line: %s" line
    let response = 
        Http.RequestString ( 
            "http://xi:9000/?properties={\"annotators\": \"tokenize,ssplit,sentiment\", \"date\": \"2017-04-02T15:13:00\", \"outputFormat\": \"json\"}&pipelineLanguage=en",
            headers = [ "Content-Type", "text/plain;;charset=UTF-8" ],
            httpMethod = "POST", 
            body = TextRequest line)
    let result = SentimentResponse.Parse(response);
    result.Sentences |> Seq.averageBy (fun s -> float (s.SentimentValue - 2))

let sentimentLines =
    characterLines
    |> Seq.where (fun (season, episode, character, line) -> not (String.IsNullOrWhiteSpace(line)))
    |> Seq.map (fun (season, episode, character, line) -> 
        let sentiment = lineSentiment line
        (season, episode, character, line, sentiment))

Finally, the resulting data was saved to a CSV for analysis (disabled for Jupyter):

In [18]:
let sentimentCsv =
    sentimentLines
    |> Seq.map (fun (season, episode, character, line, sentiment) -> sprintf "%i,%i,%s,%f,\"%s\"" season episode (characterName character) sentiment line)
    
//File.WriteAllLines(Path.Combine(location, "Sentiment.csv"), sentimentCsv)

/home/nbuser/input.fsx(5,33): error FS0039: The value or constructor 'location' is not defined

## Analysis

With sentiment calculated for all valid lines for all episodes, we could start performing some analysis on the data. To load the data, I used [FSharp.Data's CsvProvider](http://fsharp.github.io/FSharp.Data/library/CsvProvider.html) to parse the CSV created in the previous steps into an easily manipulatable record:

In [3]:
type SentimentType = CsvProvider<"https://raw.githubusercontent.com/ibebbs/RedDwarfAnalysis/master/Sentiment.csv", Schema = "Season,Episode,Character,Sentiment (float),Line (string)">
let sentiment= SentimentType.Load("https://raw.githubusercontent.com/ibebbs/RedDwarfAnalysis/master/Sentiment.csv")

To start with, I thought it might be good if we can visually see if there are any obvious changes in average episode sentiments across the seasons.

_Note how we force the ordering of characters in the chart. This is done to ensure Lister (who is the only character to have been in every episode) is first, thereby ensuring correct ordering of episodes. If this isn't done, then the chart can be seen to 'double back' on itself when characters miss episodes_

In [17]:
let characterOrder =
    ["LISTER"; "RIMMER"; "CAT"; "KRYTEN"; "HOLLY" ]
    |> Seq.mapi (fun index character -> (character, index))
    |> Map.ofSeq

let options = 
  Options(
    pointSize=3,  
    trendlines=[|
      Trendline(opacity=0.5,lineWidth=5);
      Trendline(opacity=0.5,lineWidth=5);
      Trendline(opacity=0.5,lineWidth=5);
      Trendline(opacity=0.5,lineWidth=5);
      Trendline(opacity=0.5,lineWidth=5)|],     
    hAxis=Axis(title="Episode"),
    vAxis=Axis(title="Sentiment", format = "0.00"))

sentiment.Rows
|> Seq.groupBy (fun s -> s.Character)
|> Seq.sortBy (fun (character, lines) -> characterOrder.[character])
|> Seq.map (fun (character, lines) -> 
    lines 
    |> Seq.groupBy(fun l -> sprintf "S%sE%s" (l.Season.ToString("00")) (l.Episode.ToString("00"))) 
    |> Seq.map (fun (episode, l) -> (episode, l |> Seq.averageBy(fun l -> (l.Sentiment + 1.0) / 2.0)))
    |> Seq.sortBy (fun (episode, sentiment) -> episode))
|> Chart.Line
|> Chart.WithOptions (options)
|> Chart.WithLabels ["LISTER"; "RIMMER"; "CAT"; "KRYTEN"; "HOLLY"]
|> Chart.WithTitle "Character Average Sentiment Per Episode"

As can be seen, apart from Holly getting a little jaded towards the end of series 5 (Quaranteen - actually a bit of an outlier as she only has a few lines in this episode) there doesn't seem to be any significant trend in changing sentiment. Given we're interested in the change across seasons, how about we look at the average sentiment of each character in each season:

In [18]:
sentiment.Rows
|> Seq.groupBy (fun s -> s.Character)
|> Seq.sortBy (fun (character, lines) -> characterOrder.[character])
|> Seq.map (fun (character, lines) -> 
    lines 
    |> Seq.groupBy(fun l -> sprintf "S%s" (l.Season.ToString("00"))) 
    |> Seq.map (fun (episode, l) -> (episode, l |> Seq.averageBy(fun l -> (l.Sentiment + 1.0) / 2.0)))
    |> Seq.sortBy (fun (episode, sentiment) -> episode))
|> Chart.Line
|> Chart.WithOptions (options)
|> Chart.WithLabels ["LISTER"; "RIMMER"; "CAT"; "KRYTEN"; "HOLLY"]
|> Chart.WithTitle "Character Average Sentiment Per Season"

Well, that's interesting. With the exception of Kryten (and excluding Holly's return) the average sentiment of all the main characters generally trends down over the seasons. I would therefore imagine the season sentiment overall would follow a similar pattern:

In [20]:
sentiment.Rows
|> Seq.groupBy(fun l -> sprintf "S%s" (l.Season.ToString("00"))) 
|> Seq.map (fun (episode, l) -> (episode, l |> Seq.averageBy(fun l -> (l.Sentiment + 1.0) / 2.0)))
|> Seq.sortBy (fun (episode, sentiment) -> episode)
|> Chart.Line
|> Chart.WithOptions (options)
|> Chart.WithLabel "Sentiment"
|> Chart.WithTitle "Average Sentiment Per Season"

Which quite accurately reflects my opinion of the show's deterioration. Perhaps I no longer appreciate the humour as it has simply become more cynical over the last few seasons?

Note to self: it would have been interesting to record my rating for each show / season prior to starting this investigation which I could then have compared to the above.

To see if there's any correlation between sentiment and rating, I'm going to pull in the rating parsing code authored for [part 1](https://notebooks.azure.com/n/bSdVlvDz5sI/notebooks/Investigation.ipynb) of this analysis.

In [21]:
let ratingCategoryNames = [
  "Males";
  "Females";
  "Aged under 18";
  "Males under 18";
  "Aged 18-29";
  "Males Aged 18-29";
  "Females Aged 18-29";
  "Aged 30-44";
  "Males Aged 30-44";
  "Females Aged 30-44";
  "Aged 45+";
  "Males Aged 45+";
  "Females Aged 45+";
  "Top 1000 voters";
  "US users";
  "Non-US users";
]

type RatingCategory =
  | ``Males`` = 0
  | ``Females`` = 1
  | ``Aged under 18`` = 2
  | ``Males under 18`` = 3
  | ``Aged 18-29`` = 4
  | ``Males Aged 18-29`` = 5
  | ``Females Aged 18-29`` = 6
  | ``Aged 30-44`` = 7
  | ``Males Aged 30-44`` = 8
  | ``Females Aged 30-44`` = 9
  | ``Aged 45`` = 10
  | ``Males Aged 45`` = 11
  | ``Females Aged 45`` = 12
  | ``Top 1000 voters`` = 13
  | ``US users`` = 14
  | ``Non-US users`` = 15

type EpisodeRatings = {
    Id : string;
    Category : RatingCategory;
    Votes : int;
    Rating : decimal
}

let parseCategory c =
  let index = Seq.tryFindIndex (fun cn -> cn = c) ratingCategoryNames
  match index with
  | Some x -> Some (enum<RatingCategory>(x))
  | None -> None

let parseRatings id =
  let title (node : HtmlNode) =
      node.Descendants["a"]
      |> Seq.map (fun d -> d.InnerText())
  
  let votes (node : HtmlNode) =
      [ node.InnerText() ]
  
  let rating (node : HtmlNode) =
      [ node.InnerText() ]
  let document = HtmlDocument.Load("https://raw.githubusercontent.com/ibebbs/RedDwarfAnalysis/master/Ratings/" + id + ".html")
  let content = document.CssSelect("#tn15content").[0]
  let tables = 
    content.Descendants["table"]
    |> Seq.toArray
  let rows =
    tables.[1].Descendants["tr"]
    |> Seq.map (fun row -> (row, row.Descendants["td"] |> Seq.toArray))
    |> Seq.where (fun (row, data) -> data.Length = 3)
    |> Seq.map (fun (row, data) -> ( (title data.[0]), (votes data.[1]), (rating data.[2])))
    |> Seq.collect (fun (t, v, r) -> Seq.zip3 t v r)
    |> Seq.map (fun (t, v, r) -> ((parseCategory t), System.Int32.Parse(v.Trim()), System.Decimal.Parse(r.Trim())))
    |> Seq.where (fun (t, v, r) -> t.IsSome)
    |> Seq.map (fun (t, v, r) -> { Id = id; Category = t.Value; Votes = v; Rating = r })
  rows

Which allows us to create a comparison chart:

In [28]:
let ``Average Sentiment Per Season`` = 
    sentiment.Rows
    |> Seq.groupBy(fun l -> sprintf "S%s" (l.Season.ToString("00"))) 
    |> Seq.map (fun (season, l) -> (season, l |> Seq.averageBy(fun l -> (l.Sentiment + 1.0) / 2.0)))
    |> Seq.sortBy (fun (season, sentiment) -> season)
    |> Seq.toArray

let ``Top 1000 voters - Average Rating Per Season`` =
    episodeSources
    |> Seq.where (fun s -> s.Transcript.IsSome)
    |> Seq.collect (fun es -> 
        parseRatings es.Id
        |> Seq.where (fun r -> r.Category = RatingCategory.``Top 1000 voters``)
        |> Seq.map (fun r -> (es.Season, es.Episode, (float) r.Rating / 10.0)))
    |> Seq.groupBy (fun (season, episode, rating) -> sprintf "S%s" (season.ToString("00")))
    |> Seq.map (fun (season, ratings) -> (season, ratings |> Seq.averageBy (fun (season, episode, rating) -> rating)))
    |> Seq.sortBy (fun (season, rating) -> season)
    |> Seq.toArray

[``Average Sentiment Per Season``; ``Top 1000 voters - Average Rating Per Season``]
|> Chart.Line
|> Chart.WithOptions (options)
|> Chart.WithLabels ["Sentiment"; "Rating"]
|> Chart.WithTitle "Average Rating vs Average Sentiment Per Season"

Unfortunately, while it seems there may be a correlation, it's quite difficult to see the magnitude of the correlation in this chart. To dig into this further we'll start by mapping sentiment vs rating on a scatter chart:

In [31]:
``Top 1000 voters - Average Rating Per Season``
|> Seq.map (fun (season, rating) -> rating)
|> Seq.zip (``Average Sentiment Per Season`` |> Seq.map (fun (season, sentiment) -> sentiment))
|> Chart.Scatter
|> Chart.WithOptions (Options(trendlines = [| Trendline() |], vAxis = Axis(title = "Rating"), hAxis = Axis(title = "Sentiment")))
|> Chart.WithTitle "Sentiment vs Rating"

As we can see, while there are a couple of interesting groupings, there is a general trend suggesting higher sentiment scores get higher average ratings. This can be quanitied using the [Person correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) which is:

> a measure of the linear correlation between two variables X and Y [and] has a value between +1 and â1, where 1 is total positive linear correlation, 0 is no linear correlation, and â1 is total negative linear correlation

The math behind the coefficient is fairly involved but fortunately the [MathNet.Numerics](https://numerics.mathdotnet.com/) library (included as a dependency of the [FsLab](https://fslab.org/) package) provides a function for calculating this value as follows:

In [32]:
Correlation.Pearson(
    ``Average Sentiment Per Season`` |> Seq.map (fun (season, sentiment) -> sentiment), 
    ``Top 1000 voters - Average Rating Per Season`` |> Seq.map (fun (season, rating) -> rating)
)

0.438332006

Given that a value of 1 would represent a perfect correlation between increased sentiment and increased rating, a value of -1 would represent a perfect correlation between decreased sentiment and increased rating and a value of 0 would represent no correlation at all between sentiment and rating, our calculated value of `0.438332006` represents a reasonable correlation between increased sentiment and increased rating.

## Conclusion

Given that, as we saw above, sentiment has generally declined over the seasons and this has been reflected in the ratings, perhaps the conclusion from my first investigation - that I was "a miserable old git" - was incorrect and I am actually (as I like to believe) an optimist who simply dislikes cynicism.

In fact, from the above, it seems it is the crew of Red Dwarf who have become **miserable old gits**.

Yes, I like that conclusion much better.