Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with week time code #200

Closed
tamas-ferenci opened this issue Dec 28, 2020 · 12 comments
Closed

Problem with week time code #200

tamas-ferenci opened this issue Dec 28, 2020 · 12 comments

Comments

@tamas-ferenci
Copy link

It seems eurostat (more specifically, eurotime2date) can't handle weekly data:

temp <- eurostat::get_eurostat("demo_r_mweek3")
#> Warning in eurotime2date(x, last = FALSE): Unknown time code, W. No date conversion was made.
#> 
#>             Please fill bug report at https://github.com/rOpenGov/eurostat/issues.
#> Table demo_r_mweek3 cached at C:\Users\FERENC~1\AppData\Local\Temp\RtmpsNaBz8/eurostat/demo_r_mweek3_date_code_FF.rds
@jhuovari
Copy link

No, it doesn't. I think weekly data is relatively new addition in Eurostat.

I thought that id would be easyly fixed, but

However, there seems to be a ISOweek package: https://cran.r-project.org/web/packages/ISOweek/, which I guess gives right dates. Or we could use UK week defination (there is some difference in starting week).

Then there seems to be also week W99. How, that is supposed to be treated?

@tamas-ferenci
Copy link
Author

tamas-ferenci commented Dec 28, 2020

Yes, I personally decided to use ISOweek package too in a similar situation. You definitely need the 8601 standard; the metadata says - for my particular example - that "the definition of ‘week’ is given by ISO8601 week number" (https://ec.europa.eu/eurostat/cache/metadata/en/demomwk_esms.htm).

99 means that the week is not known (to cite the same source: "W99 means ‘unknown week’.").

@jhuovari
Copy link

As it is converted to a Date, on what date a W99 should be converted? The last day of the last week?

@tamas-ferenci
Copy link
Author

Very good question. Definitely not the last week, as it'd imply that all people with unknown death date died on the last week, i.e. they'd be pooled together with those who indeed died on the last week. I don't know whether it breaks any consistency within eurostat, but perhaps the most clear solution would be to set their date to NA...

@jhuovari
Copy link

But then we would lose year information.

I thought that last week would have information on two dates. Dated infromation on the first day, as normal, and unknown on the last day.

@tamas-ferenci
Copy link
Author

Ah, I forget that, you're completely correct.

I am no expert in designing such things, but what you outlined seems to be a possible solution. Although the user has to be very clearly informed in this case what do those dates exactly mean (and also generally, that while there is a concrete date, the data pertains to a week).

@petrbouchal
Copy link

FWIW, {ISOweek} is now the correct solution, I think - I just ended up using it on the same data (national, not Eurostat, but produced to the same standard). Perhaps tidyverse/lubridate#506 (comment) may also be helpful.

And thanks for {eurostat}, very helpful!

@justasmundeikis
Copy link

My solution is to filter out the data from W99, which definitely is not a clean solution, but given it only affects Hungary/Latvia and Sweden.... its a workaround.

W99 values by geo and year:

geo/year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
HU 5 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
LV 90 72 63 41 19 33 29 33 33 19 20 13 18 NA NA NA NA NA NA NA NA NA
SE 1493 1534 1520 1437 1137 822 749 650 515 538 402 428 439 464 486 960 1963 2230 2513 2616 2663 713

So right know I have this code working fine:

df <- demo_r_mwk_ts%>%
  # extract year
  # extract weeknr
  mutate(year=substr(time,1,4),
         week=substr(time,6,7))%>%
  #filter out week 99
  filter(week!=99)%>%
  # create date using "ISOweek" package
  mutate(date=ISOweek:::ISOweek2date(paste0(year,"-W",week,"-1")))

The best way would be if Eurostat would divide the W99 values and assign them to each week of the year accordingly to known values week "weights". If anybody works with countries, that have W99 data, then I would suggest to do this manually.

@antagomir
Copy link
Member

If this is a common need, would it be feasible to have an additional enrichment function that could be run after data retrieval?

@tamas-ferenci
Copy link
Author

tamas-ferenci commented Apr 15, 2021

"The best way would be if Eurostat would divide the W99 values and assign them to each week of the year accordingly to known values week "weights". If anybody works with countries, that have W99 data, then I would suggest to do this manually." I completely agree. As a minimum solution, proportionally increasing all values would work in my opinion. (At least if the proportion of values reported for W99 is small compared to the total.)

@pitkant
Copy link
Member

pitkant commented Jun 28, 2023

I did some testing with the dataset mentioned here and I have to say fixing this weekly data issue was easier than figuring out how to efficiently handle this dataset with 110 million row (after pivot_longer). 16 GB of RAM wasn't apparently enough the way it was done before. The results are in commit cfdaf37 of the v4-dev branch (version 4.0.0.9002).

Based on the discussion here I couldn't figure out a sensible solution to W99 values. Drop it? Assign it to the last day of the year? Distribute the values evenly for the whole year? In my solution I coerced it to the first day of the first week of the year and the function prints a warning message for the user, suggesting to use get_eurostat(time_format = "raw") if they wish to wrangle the data manually. Might not be optimal and I'd love to hear your thoughts on the matter.

@pitkant
Copy link
Member

pitkant commented Dec 20, 2023

Closed with the CRAN release of package version 4.0.0

@pitkant pitkant closed this as completed Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

6 participants