Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/data-raw/afl_tables_playerstats/afldata.rda is not current #69

Closed
afableco opened this issue Apr 2, 2019 · 5 comments
Closed

/data-raw/afl_tables_playerstats/afldata.rda is not current #69

afableco opened this issue Apr 2, 2019 · 5 comments

Comments

@afableco
Copy link

@afableco afableco commented Apr 2, 2019

The data in afldata.rda is not current and so when the data is being updated each week it is not fixing the out of date data.

I am trying to use the Brownlow.Votes field for 2018, but this only has 216 votes in it; rather than the full 1188 (6 voting points * 198 games).

When I ran:

temp <- get_afltables_urls('2018-01-01','2018-12-31')
dat_new <- scrape_afltables_match(temp)

The resulting table ends up with the full 1188 votes.

@jimmyday12
Copy link
Owner

@jimmyday12 jimmyday12 commented Apr 2, 2019

@afableco The function get_afltables_stats() does exactly what you do here under the hood!

It takes start_data and end_data arguments too.

get_afltables_stats('2018-01-01', '2018-12-31')
@jimmyday12 jimmyday12 closed this Apr 2, 2019
@afableco
Copy link
Author

@afableco afableco commented Apr 2, 2019

@jimmyday12 I agree it is doing this under the hood, but you have missed my point. The file afldata.rda, which you maintain on github, does not have the correct Brownlow.Votes in it for 2018. get_afltables_stats , downloads afldata.rda and will only scrape the AFLtables website if the data you are asking for is after the maximum date in the afldata.rda. If you run the code below you will see what I mean.

Using_get_afltables_stats <- get_afltables_stats('2018-01-01', '2018-12-31')

# Sums to 216
Using_get_afltables_stats %>% summarise(sum(Brownlow.Votes))
temp <- get_afltables_urls('2018-01-01','2018-12-31')
dat_new <- scrape_afltables_match(temp)

# Sums to 1188
dat_new %>% summarise(sum(Brownlow.Votes))
@jimmyday12
Copy link
Owner

@jimmyday12 jimmyday12 commented Apr 2, 2019

Ahhh I see what you mean, thanks for clarifying. I can take a look.

@jimmyday12 jimmyday12 reopened this Apr 2, 2019
@jimmyday12 jimmyday12 closed this in 7941559 Apr 3, 2019
@afableco
Copy link
Author

@afableco afableco commented Apr 3, 2019

I just tried to run get_afltables_stats, but I had the same issue. On further investigation, it appears the URL within get_afltables_stats that links to the afldata.rda on github is referencing the development branch and not the master branch. Sorry for the bum steer earlier.

In get_afltables_stats the current URL is:

dat_url <- url("https://github.com/jimmyday12/fitzRoy/raw/develop/data-raw/afl_tables_playerstats/afldata.rda")

The correct data was returned when I replaced it with:

dat_url <- url("https://github.com/jimmyday12/fitzRoy/blob/master/data-raw/afl_tables_playerstats/afldata.rda?raw=true")

function (start_date = "1897-01-01", end_date = Sys.Date()) 
{
    start_date <- lubridate::parse_date_time(start_date, c("dmy", 
        "ymd"))
    if (is.na(start_date)) {
        stop(paste("Date format not recognised", "Check that start_date is in dmy or ymd format"))
    }
    end_date <- lubridate::parse_date_time(end_date, c("dmy", 
        "ymd"))
    if (is.na(end_date)) {
        stop(paste("Date format not recognised", "Check that end_date is in dmy or ymd format"))
    }
    message(paste0("Returning data from ", start_date, " to ", 
        end_date))
    dat_url <- url("https://github.com/jimmyday12/fitzRoy/blob/master/data-raw/afl_tables_playerstats/afldata.rda?raw=true")
    load_r_data <- function(fname) {
        load(fname)
        get(ls()[ls() != "fname"])
    }
    dat <- load_r_data(dat_url)
    max_date <- max(dat$Date)
    if (end_date > max_date) {
        urls <- get_afltables_urls(max_date, end_date)
        dat_new <- scrape_afltables_match(urls)
        dat <- dplyr::bind_rows(dat, dat_new)
    }
    message("Finished getting afltables data")
    dat <- dat %>% dplyr::group_by(.data$ID) %>% dplyr::mutate(First.name = dplyr::first(.data$First.name), 
        Surname = dplyr::first(.data$Surname))
    dat$Round[dat$Round == "Grand Final"] <- "GF"
    dat$Round[dat$Round == "Elimination Final"] <- "EF"
    dat$Round[dat$Round == "Preliminary Final"] <- "PF"
    dat$Round[dat$Round == "Qualifying Final"] <- "QF"
    dat$Round[dat$Round == "Semi Final"] <- "SF"
    dplyr::filter(dat, .data$Date > start_date & .data$Date < 
        end_date) %>% dplyr::ungroup()
}
@jimmyday12
Copy link
Owner

@jimmyday12 jimmyday12 commented Apr 3, 2019

@afableco Yep you are spot on (a few other small changes were required too). I just released a commit a few hours ago so if you reinstall you should get the working function.

I've also got a unit test specifically for your issue so it should get picked up on travis-ci if anything gets out of sync like that again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.