# Has COVID-19 impacted review of scneted candles negatively?

First, we'll add some libraries to our classpath.

In [86]:
%%classpath add mvn
tech.tablesaw tablesaw-beakerx 0.38.1
tech.tablesaw tablesaw-excel 0.38.1
tech.tablesaw tablesaw-aggregate 0.38.1

And add some associated imports.

In [87]:
%import tech.tablesaw.api.*
%import tech.tablesaw.io.xlsx.XlsxReader
%import tech.tablesaw.selection.Selection

%import java.time.LocalDate
%import java.time.LocalDateTime
%import java.util.function.Function

%import static java.time.Month.JANUARY
%import static tech.tablesaw.aggregate.AggregateFunctions.mean
%import static tech.tablesaw.api.QuerySupport.and
%import static tech.tablesaw.io.xlsx.XlsxReadOptions.builder

Plus a helper method since the two graphs are very similar.

In [88]:
plots = { String url, String lineColor, String markerColor ->
    def table = new XlsxReader().read(builder(new URL(url)).build())

    table.addColumns(
        DateColumn.create('YearMonth', table.column('Date').collect { LocalDate.of(it.year, it.month, 15) })
    )
    def janFirst2017 = LocalDateTime.of(2017, JANUARY, 1, 0, 0)
    Function<Table, Selection> from2017 = { r -> r.dateTimeColumn('Date').isAfter(janFirst2017) }
    Function<Table, Selection> top3 = { r -> r.intColumn('CandleID').isLessThanOrEqualTo(3) }

    def byMonth = table.sortAscendingOn('Date')
            .where(and(from2017, top3))
            .summarize('Rating', mean).by('YearMonth')
    def byDate = table.sortAscendingOn('Date')
            .where(and(from2017, top3))
            .summarize('Rating', mean).by('Date')

    def averaged = new Line(x: byMonth.dateColumn('YearMonth').toList(), y: byMonth.nCol('Mean [Rating]').toList())
    def scatter = new Points(x: byDate.dateTimeColumn('Date').toList(), y: byDate.nCol('Mean [Rating]').toList())
    [averaged, scatter]
}
OutputCell.HIDDEN

Let's create a line representing when COVID was first reported.

In [89]:
def covidReported = LocalDateTime.of(2020, JANUARY, 20, 0, 0)
line = new Line(x: [covidReported]*2, y: [1, 5])
OutputCell.HIDDEN

Now the graph for scented candles:

In [90]:
def scentedUrl = 'https://github.com/paulk-asert/groovy-data-science/blob/master/subprojects/Candles/src/main/resources/Scented_all.xlsx?raw=true'
def (sAverage, sScatter) = plots(scentedUrl, 'seablue', 'lightskyblue')
plot = new Plot(title: "Top 3 scented candles Amazon reviews 2017-2020", xLabel: 'Date', yLabel: 'Average daily rating (1-5)')
plot << sAverage
plot << sScatter
plot << line

Now the graph for unscented candles:

In [91]:
def unscentedUrl = 'https://github.com/paulk-asert/groovy-data-science/blob/master/subprojects/Candles/src/main/resources/Unscented_all.xlsx?raw=true'
def (uAverage, uScatter) = plots(unscentedUrl, 'seagreen', 'lightgreen')
plot = new Plot(title: "Top 3 unscented candles Amazon reviews 2017-2020", xLabel: 'Date', yLabel: 'Average daily rating (1-5)')
plot << uAverage
plot << uScatter
plot << line