Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Mistake in French locale #194

Closed
briatte opened this Issue · 7 comments

3 participants

@briatte

Hello,

There is a parsing issue in the French locale:

> dmy("3 janvier 2013", locale = "fr_FR")
[1] "2013-01-03 UTC"
> dmy("3 juillet 2013", locale = "fr_FR")
[1] NA
Warning message:
All formats failed to parse. No formats found.

The month juillet is the month july.

If this is not an issue with lubridate itself, please let me where shall I report this?

@vspinu
Collaborator

Works fine for me under linux:

dmy("3 janvier 2013", locale = "fr_FR.utf8")
[1] "2013-01-03 UTC"

What do you get on

lubridate:::.build_locale_regs("fr_FR.utf8")$alpha_exact[["b"]]

A very long thread #181 might be relevant here, but would good to know more about the system.

@briatte

You are right, the problem is on my end:

[1] "((?<b_b_e>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)|(?<B_b_e>January|February|March|April|May|June|July|August|September|October|November|December))(?![[:alpha:]])"
Warning message:
In Sys.setlocale("LC_TIME", locale) :
  OS reports request to set locale to "fr_FR.utf8" cannot be honored

I am running Mac OS X, with the whole system set to English.

Session info:

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] colorspace_1.2-2   dichromat_2.0-0    digest_0.6.3       ggplot2_0.9.3.1    grid_3.0.1        
 [6] gtable_0.1.2       labeling_0.2       MASS_7.3-27        munsell_0.4.2      plyr_1.8          
[11] proto_0.3-10       RColorBrewer_1.0-5 reshape2_1.2.2     scales_0.2.3       stringr_0.6.2     
[16] tools_3.0.1       

It'd be useful to clarify why juillet is the only month that does not parse, though. The rest of the months work fine.

@hadley
Owner

On OS X the locale is called fr_FR:

> lubridate:::.build_locale_regs("fr_FR")$alpha_exact[["b"]]
[1] "((?<b_b_e>jan|fév|mar|avr|mai|jui|jul|aoû|sep|oct|nov|déc)|(?<B_b_e>janvier|février|mars|avril|mai|juin|juillet|août|septembre|octobre|novembre|décembre))(?![[:alpha:]])"

It's not obvious why julliet fails to match that regexp though.

@briatte

Same thing here:

> lubridate:::.build_locale_regs("fr_FR")$alpha_exact[["b"]]
[1] "((?<b_b_e>jan|fév|mar|avr|mai|jui|jul|aoû|sep|oct|nov|déc)|(?<B_b_e>janvier|février|mars|avril|mai|juin|juillet|août|septembre|octobre|novembre|décembre))(?![[:alpha:]])"

And it does work elsewhere:

> regexpr("janvier|février|mars|avril|mai|juin|juillet|août|septembre|octobre|novembre|décembre", "13 juillet 2013")
[1] 4
attr(,"match.length")
[1] 7
> grepl("janvier|février|mars|avril|mai|juin|juillet|août|septembre|octobre|novembre|décembre", "13 juillet 2013")
[1] TRUE
@vspinu
Collaborator

I meant that juillet is parsed for me on linux:

dmy("3 juillet 2013", locale = "fr_FR.utf8")
[1] "2013-07-03 UTC"

So it is again a regexp issue on OS and it is getting closer to our Japanese friend problem.

@briatte

R 3.0.3 has solved this issue:

  • strptime() now checks the locale only when locale-specific formats are used and caches the locale in use: this can halve the time taken on OSes with slow system functions (e.g. OS X).

  • strptime() and the format() methods for classes "POSIXct", "POSIXlt" and "Date" recognize strings with marked encodings: this allows, for example, UTF-8 French month names to be read on (French) Windows.

The mistake it caused in lubridate is gone:

library(lubridate)

clean_date = function(x) {

  # fix bugs in dates
  x = gsub("juillet", "07", x) # fix small bug in French month parser

  # parse to Date format
  x = parse_date_time(x, "%d %m* %Y", locale = "fr_FR.UTF-8")
  x = as.Date(x)

  return(x)
}

unclean_date = function(x) {

  # parse to Date format
  x = parse_date_time(x, "%d %m* %Y", locale = "fr_FR.UTF-8")
  x = as.Date(x)

  return(x)
}

clean_date("15 juillet 2007")
[1] "2007-07-15"

# returns NA on R < 3.0.3
unclean_date("15 juillet 2007")
[1] "2007-07-15"

The "juillet" month does not need fixing anymore.

@briatte briatte closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.