# Ch7 Notes - Handling time-series and missing data 
This chapter will introduce the idea of web scraping website data and accessing api's using HTTP, handling missing data which is often found in our scraped data, dealing with errors when the data is not available, perform basic statistical calculations, and then plot the analysed data (whilst properly handling missing values)

### Api structure
We must understand how the API requires one to make requests in order to access the data. It is typical with so called "REST" API's that one is required to construct a url query of the form `<api>/<sitename>/<database>/<table>/<year-identifier>/?format=json>` where the specific fields will changed based on what you're trying to retrieve e.g. the human genome database, the 39th assembly, chromosome positions. Most sites have a specification for how to do this. 

Let's do some retrieval! We'll need the HTTP.jl package

In [3]:
using Pkg; using HTTP

In [6]:
http_query = HTTP.get("https://api.nbp.pl/api/exchangerates/rates/a/usd/2020-06-01/?format=json")

HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Date: Fri, 28 Jun 2024 03:47:08 GMT
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/json; charset=utf-8
Expires: -1
ETag: "ZVW72VHmEFszyVJaD6HB4ZuqPBFnr2L+4FwjIJloFkk="
Vary: Accept-Encoding
Content-Encoding: gzip
Transfer-Encoding: chunked

{"table":"A","currency":"dolar amerykański","code":"USD","rates":[{"no":"105/A/NBP/2020","effectiveDate":"2020-06-01","mid":3.9680}]}"""

Load up JSON3.jl too

In [5]:
using JSON3

Download the query and read the body using JSON3.read 

In [7]:
json_query = JSON3.read(http_query.body) 

JSON3.Object{Vector{UInt8}, Vector{UInt64}} with 4 entries:
  :table    => "A"
  :currency => "dolar amerykański"
  :code     => "USD"
  :rates    => Object[{…

Take a look at the http_query body to get a sense for how the data itself is stored (in bits) 

In [8]:
http_query.body

134-element Vector{UInt8}:
 0x7b
 0x22
 0x74
 0x61
 0x62
 0x6c
 0x65
 0x22
 0x3a
 0x22
 0x41
 0x22
 0x2c
    ⋮
 0x64
 0x22
 0x3a
 0x33
 0x2e
 0x39
 0x36
 0x38
 0x30
 0x7d
 0x5d
 0x7d

If we want human readable format, we can just parse is as a String using the **String()** function

In [9]:
http_string = String(http_query.body)

"{\"table\":\"A\",\"currency\":\"dolar amerykański\",\"code\":\"USD\",\"rates\":[{\"no\":\"105/A/NBP/2020\",\"effectiveDate\":\"2020-06-01\",\"mid\":3.9680}]}"

**"Calling the String constructor on Vector{UInt8} consumes the data stored in a vector. The benefit of this behavior is that the operation is very fast. The downside is that you can perform the conversion only once. After the operation, the response.body vector is empty, so calling String(response.body) again would produce an empty string ("")."** --- **"The fact that the String constructor empties the Vector{UInt8} source that is passed to it is one of the rare cases in Julia when a function mutates an object passed to it that does not have the ! suffix in its name. Therefore, it is important that you remember this exception. In our example, if you wanted to preserve the value stored in response.body, you should have copied it before passing it to the String constructor as follows: String(copy(response.body))."**

Now let's access the fields of our json query 

In [11]:
json_query.code

"USD"

In [18]:
json_query.rates

1-element JSON3.Array{JSON3.Object, Vector{UInt8}, SubArray{UInt64, 1, Vector{UInt64}, Tuple{UnitRange{Int64}}, true}}:
 {
              "no": "105/A/NBP/2020",
   "effectiveDate": "2020-06-01",
             "mid": 3.968
}

In [21]:
json_query.rates[1].mid

3.968

Say we know that our array only contains a single element, like the one above, instead of indexing, we can use the **only()** base function, suffixed with the name of the field we're intersted in

In [22]:
only(json_query.rates).mid

3.968

"The only function is quite useful when writing production code, as it allows you to easily catch bugs if your data does not meet the assumptions."

## Handling situations where an API query fails
Let’s discuss how to handle exceptions so that they do not terminate our program if we do not want them to. For this, we use the try-catch-end block. We'll implement a try-catch-end block for when our http request fails. If it fails we will get a 'missing' value back, rather than our entire program breaking down and stalling, we can handle the error in a more deliberate fashion. Furthermore, our code will likely be embedded amongst other functions and executions which would ideally still require execution if they are not dependent on the output of the previous code, meaning we'd prefer for this to proceed.

In [23]:
query = "https://api.nbp.pl/api/exchangerates/rates/a/usd/" *
               "2020-06-01/?format=json"
"https://api.nbp.pl/api/exchangerates/rates/a/usd/2020-06-01/?format=json"
 
try
    response = HTTP.get(query)
    json = JSON3.read(response.body)
    only(json.rates).mid
catch e
    if e isa HTTP.ExceptionRequest.StatusError
        missing
    else
        rethrow(e)
    end
end

3.968

Now for the failed request

In [26]:
query = "https://api.nbp.pl/api/exchangerates/rates/a/usd/" *
               "2020-06-06/?format=json"
"https://api.nbp.pl/api/exchangerates/rates/a/usd/2020-06-06/?format=json"
 
try
    response = HTTP.get(query)
    json = JSON3.read(response.body)
    only(json.rates).mid
catch e
    if e isa HTTP.ExceptionRequest.StatusError
        missing
    else
        rethrow(e)
    end
end

missing

In summary, we generally wan't to reserve such try-catch blocks for exceptional circumstances, and not rely on it too much. Our focus should instead be aimed at writing robust code which minimizes error possibilities. This is a large subject

## Working with missing data 
Real life data will often have missing values within it - measurements are not always made, human error pokes it's ears up, variance is common and should be expected. For this, the specific Type `missing` is here for a reason. It is not simply 0, which has a definit value, but is unknown because it is missing - an important difference. In contrast, the `nothing` Type is used when the value is indeed nothing - which for instance may occur if we are collecting peoples favorite basketball team, and we ask someone who doesn't follow basketball, their response would be nothing.  

#### Propagating missing values in functions
The concept of propogation is an important one to continually be aware of - for instance, multiplying anything by 0 will lead to the propogation of 0 in the end result, no matter what. the `missing` value also has this efffect, whereby anything added to `missing` will lead to the propogation of only `missing`, discarding everything else, including 0 itself!

In [30]:
1 + missing

missing

In [31]:
1 * missing

missing

What's really important to also keep in mind is that `missing` will ALSO be propogated to comparisons such as equals, greater, lesser

In [32]:
missing == 0

missing

In [33]:
missing < 0

missing

If we are unaware of missing values in our datasets, this can cause problems downstream is we before some row-wise operations, broadcasting etc. There is a caveat whereby only `true` will evaluate to `true` when compared to missing during a conditional

In [36]:
true | missing 

true

**"
The design of handling missing in Julia requires you to explicitly decide whether missing should be treated as true or false. This is achieved with the coalesce function, which you might know from SQL (http://mng.bz/BZ1r). Its definition is simple: coalesce returns its first nonmissing positional argument, or missing if all its arguments are missing.
"**

"The use of coalesce is most common with handling logical conditions. If you write coalesce(condition, true), you say that if the condition evaluates to missing, you want this missing to be treated as true. Similarly, coalesce(condition, false) means that you want to treat missing as false. Here is an example:"

In [37]:
coalesce(missing, true)

true

In [38]:
coalesce.([1, missing, 3, true], true)

4-element Vector{Integer}:
    1
 true
    3
    1

The function essentially specifies how julia will treat missing values - whether we want them to evaluate as `true` or `false` when they are encountered. The other option is that they are propogated and returned as missing!


If we want to perform comparisons in the sense of equals, less/greater than, we can use the functions `isequal()`, `isless()` which will return true or false values. As a rule of thumb, `missing` values are ALWAYS greater than any numbers. 

In [39]:
isless(4, missing)

true

In [40]:
isequal(missing,missing)

true