In [29]:
# Setting up a custom stylesheet in IJulia
file = open("style.css") # A .css file in the same folder as this notebook file
styl = read(file,String) # Read the file
HTML("$styl") # Output as HTML

SystemError: SystemError: opening file "style.css": No such file or directory

<h2>Importing data set</h2>


In [30]:
using DelimitedFiles
wikiEVDraw = DelimitedFiles.readdlm("P3006571-20231005190000-PSL-Alameda.csv", ',')


7204×34 Matrix{Any}:
  "\ufeffµPMU location"        "PSL-Alameda"  …    ""                    ""
  "µPMU latitude"            44.4791               ""                    ""
  "µPMU longitude"          -73.198                ""                    ""
  "sample interval (msec)"     "date stamp"        "frequency C37 L1-E"  ""
 8.33333                       "2023/10/5"       60.0245                 ""
 8.33333                       "2023/10/5"    …  60.0353                 ""
 8.33333                       "2023/10/5"       60.0331                 ""
 8.33333                       "2023/10/5"       60.0261                 ""
 8.33333                       "2023/10/5"       60.0272                 ""
 8.33333                       "2023/10/5"       60.0237                 ""
 ⋮                                            ⋱                          
 8.33333                       "2023/10/5"    …  60.0119                 ""
 8.33333                       "2023/10/5"       60.0253             

[Back to the top](#In-this-lecture)

<h2>Converting a date string to DateTime format</h2>

Remarkably enough, data on dates and times are among the fiddliest things a data scientist has to deal with. There are a huge number of different ways in which such data are reported, and moreover there are conflicting standards of how to deal irregularities (month lengths aren't all the same, some years are leap years, time zones shift ...).

In consequence, every computer language that deals with date-time data has a rich array of  functions to deal with it. In Julia, they are in a package called Dates. Of this package, we will use the functions ``DateTime()`` and ``Dates.datetime2rata()``.

Why does one of them use the dot syntax and the other does not? The answer is that when you start up Julia, only a few of the functions in the package Dates are visible. These functions include ``DateTime()`` but not ``datetime2rata()``. However, we are able to access the other functions via the dot notation. We will talk more about packages in the next lecture.

The ``DateTime()`` function uses a format string convert string data such as we see in column one into Julia DateTime data.

A format string is something one sees in many computation contexts. Here, it tells Julia in what form to expect the data. Looking at the date strings in the data, they have a number for the day, then space, then an abbreviation for the month, then a space, then a number for the year. The appropriate format string is therefore ``"d u y"``. These formats have limitations: ``d`` accepts one- and two-digit days (which should always work) and ``y`` accepts two- and four-digit years (which should mostly work), but ``u`` accepts only three-letter abbreviations. Unfortunately, data where the month names otherwise abbreviated are fairly common and they will need a different format string.

Here is an example of how the conversion works


In [31]:
using Dates
Dates.DateTime(wikiEVDraw[1,1], "d u y")

ArgumentError: ArgumentError: Unable to parse date time. Expected directive DatePart(d) at char 1

[Back to the top](#In-this-lecture)

<h2>"for" loops</h2>

Now we need to do this conversion for every element in column 1 of the matrix. The way to do this is with a ``for`` loop.

``for`` loops are extremely important in computing, and in Julia even more so. This is because many items that are vectorised in languages like Matlab and Python are explicitly computed in ``for`` loops in Julia. It may surprise many of you who know about speeding up computations using vectorisation, but it is frequently the case that a loop in Julia runs *faster* than the equivalent vectorised code.

``for`` loops have a simple structure: the outside is the ``for ... end`` part and the inside is a code block executed repeatedly. Exactly how many times is determined by the the ``for ... end`` part.

In the two examples below, we use ``println()`` to show the value of the variable over which the ``for`` loop runs. Notice that these values do not have to be a sequence of integers.


In [32]:
for num = 3:7    # here, the colon is used to specify a range; we will see this again
    println("num is now $num")       # remember that the special character `$` is used for string interpolation
end

testvalues = [23, "my name is not a name", 'ℵ']      # an array with some rather odd elements
for x in testvalues    # a for loop can iterate over an array
        println("The value of x is now $x")
    end

num is now 3
num is now 4
num is now 5
num is now 6
num is now 7
The value of x is now 23
The value of x is now my name is not a name
The value of x is now ℵ


It is important to get the first line of a ``for`` loop exactly right. It has the structure 

"variable = iterable"

Here, "iterable" is anything that is arranged in a sequence. Not all types are, but they certainly include ranges (created with the colon operator ``:``) and any single dimension of an array. The ``=`` is an assignment operator, and it assigns to "variable" the values in "iterable", one after the other. That is, during each pass through the loop, "variable" has the value of exactly one of the items in "iterable". 

[Back to the top](#In-this-lecture)

<h2>Converting column 1 DateTime type</h2>

Now we use a ``for`` loop twice. Firstly we create a one-dimensional array containing just column one---it uses array slicing, for conversion to values with DateTime type.


In [33]:
col1 = wikiEVDraw[:, 1]  # the colon means all the data in the column, the 1 means the first column

7204-element Vector{Any}:
  "\ufeffµPMU location"
  "µPMU latitude"
  "µPMU longitude"
  "sample interval (msec)"
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 ⋮
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333

We use a ``for`` loop to overwrite the data in the variable ``col1`` with converted data using ``DateTime()`` as follows:

In [34]:
for i = 1:length(col1)
    col1[i] = Dates.DateTime(col1[i], "d u y")  # note that this replaces the previous value in col1[i]
end

ArgumentError: ArgumentError: Unable to parse date time. Expected directive DatePart(d) at char 1

In [35]:
col1  # let's view it!

7204-element Vector{Any}:
  "\ufeffµPMU location"
  "µPMU latitude"
  "µPMU longitude"
  "sample interval (msec)"
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 ⋮
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333
 8.333333

<h2>Creating data giving time in days since 22 March 2014</h2>


Finally, we create the variable "epidays". This calls to mind the concept of *epidemic day*, which is simply a way to indicate how long an epidemic has been running. We will assume that the epidemic started on 22 March 2014, with a total of 49 cases, because that is the  first date for which we have data.


Note that this is in keeping with the spirit of modelling: we are trying to do the best we can with the data we have. Even when we know that the epidemic has been traced back to a single case in early December 2013, that information is not in the table of data before us. We should not forget about it, but neither should we attempt to include it in the data.

The function we use is ``Dates.datetime2rata()``. The "Rata Die days" format is a specialised date format we will not discuss here (see https://en.wikipedia.org/wiki/Rata_Die for information). The important thing is that this function, applied to a given date, gives the number of days since 1 January of the year 0001. As follows:


In [36]:
Dates.datetime2rata(col1[1])

MethodError: MethodError: no method matching datetime2rata(::SubString{String})

Closest candidates are:
  datetime2rata(!Matched::TimeType)
   @ Dates C:\Users\marif\AppData\Local\Programs\Julia-1.9.3\share\julia\stdlib\v1.9\Dates\src\conversions.jl:99


<h2>Exporting the converted data</h2>


We create a function to express the number of days since epidemic day zero, which is the value of ``col1[54]`` which is of course 22 March 2014.

Then we iterate that function with a for loop over all the elements in col1 to create epidays. Note that the variable epidays is created before the start of the loop. This is, in general, good practice: if you know what array you want to fill, then initialise that array before you start filling it.

In [37]:
dayssincemar22(x) = Dates.datetime2rata(x) - Dates.datetime2rata(col1[54])
epidays = Array{Int64}(undef,54)
for i = 1:length(col1)
    epidays[i] = dayssincemar22(col1[i])
end


MethodError: MethodError: no method matching datetime2rata(::SubString{String})

Closest candidates are:
  datetime2rata(!Matched::TimeType)
   @ Dates C:\Users\marif\AppData\Local\Programs\Julia-1.9.3\share\julia\stdlib\v1.9\Dates\src\conversions.jl:99


Finally, we overwite column 1 of our data array with ``epidays``, and save it using ``writedlm()``. It is a good idea to use a new filename, so that all the work that went into extracting the data from wikipedia is not lost. You never know when you might want the original dates again!

In [38]:
wikiEVDraw[:, 1] = epidays
DelimitedFiles.writedlm("wikipediaEVDdatesconverted.csv", wikiEVDraw, ',')  
#         note the delimiter ... the Julia default is a tab; to get .csv, we must specify the comma

DimensionMismatch: DimensionMismatch: tried to assign 54-element array to 7204×1 destination

[Back to the top](#In-this-lecture)