-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accident_Index is not identical to Accident_Index #83
Conversation
Codecov Report
@@ Coverage Diff @@
## master #83 +/- ##
=======================================
Coverage 90.36% 90.36%
=======================================
Files 6 6
Lines 249 249
=======================================
Hits 225 225
Misses 24 24
Continue to review full report at Codecov.
|
devtools::install_github("ropensi/stats19")
#> Error: HTTP error 404.
#> Not Found
#>
#> Rate limit remaining: 56/60
#> Rate limit reset at: 2019-01-14 22:00:47 UTC
#>
#>
library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
acc_2017 = stats19::file_names$dftRoadSafetyData_Accidents_2017.zip
dl_stats19(year = 2017, type = 'acc')
#> Files identified: dftRoadSafetyData_Accidents_2017.zip
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Accidents_2017.zip
#> Attempt downloading from:
#> Data saved at /var/folders/z7/l4z5fwqs2ksfv22ghh2n9smh0000gp/T//RtmpUJVRWT/dftRoadSafetyData_Accidents_2017/Acc.csv
r = read.csv(file.path(tempdir(),
sub(".zip", "", acc_2017),
"Acc.csv"),
nrows = 1)
n = names(r)[1]
nchar(n)
#> [1] 14 Created on 2019-01-14 by the reprex
|
devtools::install_github("ropensi/stats19")
library(stats19)
acc_2017 = stats19::file_names$dftRoadSafetyData_Accidents_2017.zip
dl_stats19(year = 2017, type = 'acc')
r = read.csv(file.path(tempdir(),
sub(".zip", "", acc_2017),
"Acc.csv"),
nrows = 1)
n = names(r)[1]
nchar(n) |
|
The curl -o acc2017.zip http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Accidents_2017.zip
unzip acc2017.zip
head -n 1 acc.csv | file -i -
head -n 1 acc.csv | cut -c 1-15
Accident_Index
head -n 1 acc.csv | cut -c 1-16
Accident_Index, So we are sure its 14 characters. |
devtools::install_github("ropensi/stats19")
#> Downloading GitHub repo ropensi/stats19@master
#> from URL https://api.github.com/repos/ropensi/stats19/zipball/master
#> Installation failed: Not Found (404)
library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
acc_2017 = stats19::file_names$dftRoadSafetyData_Accidents_2017.zip
dl_stats19(year = 2017, type = 'acc')
#> Files identified: dftRoadSafetyData_Accidents_2017.zip
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Accidents_2017.zip
#> Attempt downloading from:
#> Data saved at C:\Users\georl\AppData\Local\Temp\RtmpuuS6fK/dftRoadSafetyData_Accidents_2017/Acc.csv
r = read.csv(file.path(tempdir(),
sub(".zip", "", acc_2017),
"Acc.csv"),
nrows = 1)
n = names(r)[1]
nchar(n)
#> [1] 17 Created on 2019-01-15 by the reprex package (v0.2.0). Session infodevtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.5.0 (2018-04-23)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United Kingdom.1252
#> tz Europe/London
#> date 2019-01-15
#> Packages -----------------------------------------------------------------
#> package * version date source
#> backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
#> base * 3.5.0 2018-04-23 local
#> compiler 3.5.0 2018-04-23 local
#> curl 3.2 2018-03-28 CRAN (R 3.5.0)
#> datasets * 3.5.0 2018-04-23 local
#> devtools 1.13.6 2018-06-27 CRAN (R 3.5.1)
#> digest 0.6.15 2018-01-28 CRAN (R 3.5.0)
#> evaluate 0.11 2018-07-17 CRAN (R 3.5.1)
#> git2r 0.23.0 2018-07-17 CRAN (R 3.5.1)
#> graphics * 3.5.0 2018-04-23 local
#> grDevices * 3.5.0 2018-04-23 local
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0)
#> httr 1.3.1 2017-08-20 CRAN (R 3.5.0)
#> jsonlite 1.5 2017-06-01 CRAN (R 3.5.0)
#> knitr 1.20 2018-02-20 CRAN (R 3.5.0)
#> magrittr 1.5 2014-11-22 CRAN (R 3.5.0)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
#> methods * 3.5.0 2018-04-23 local
#> R6 2.2.2 2017-06-17 CRAN (R 3.5.0)
#> Rcpp 0.12.18 2018-07-23 CRAN (R 3.5.1)
#> rmarkdown 1.10 2018-06-11 CRAN (R 3.5.1)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.0)
#> stats * 3.5.0 2018-04-23 local
#> stats19 * 0.1.1 2019-01-15 Github (ropensci/stats19@dc5da5e)
#> stringi 1.1.7 2018-03-12 CRAN (R 3.5.0)
#> stringr 1.3.1 2018-05-10 CRAN (R 3.5.1)
#> tools 3.5.0 2018-04-23 local
#> utils * 3.5.0 2018-04-23 local
#> withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
#> yaml 2.2.0 2018-07-25 CRAN (R 3.5.1) |
Heads-up @layik the above reprex shows the behaviour on Windows. Hope that is useful. May be worth asking the question here if you're finding consistently strange behaviour: https://stat.ethz.ch/mailman/listinfo/r-devel |
From an email into The bash code can show that the column name actually has what is called Byte Order Mark, unicode value
The question is then why does |
|
Link to conversation: http://r.789695.n4.nabble.com/Potential-R-bug-in-identical-td4754898.html Interesting stuff. I imagine there's a reason why |
Is this an endian issue? |
One for the |
identical
function in R behaves differently on MacOS, Linux vs Windows for a particular case instats19
strings that can be reproduced.Links & notes:
https://www.r-project.org/bugs.html
crashes 2017 file first column name here results in
http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Accidents_2017.zip