diff --git a/R/PM10.R b/R/PM10.R index b229a5e..fe84dfa 100644 --- a/R/PM10.R +++ b/R/PM10.R @@ -1,23 +1,65 @@ -##' TITLE +##' Real-wold data on PM10 pollution in Rouen area, France ##' -##' DESCRIPTION -##' -##' @format The format is a list of 2 component: -##' -##' $x: A data-frame containing input variables: with 100 obs. of 200 -##' variables ; -##' -##' $y: Output variable: a factor with 2 levels "-1" and "1". -##' -##' @examples -##' -##' ##' -##' \dontrun{ +##' These data are TEOM (Tapered Element Oscillating Microbalance) PM10 +##' concentrations from 2004 to 2006 (1096 days) measured by Air Normand, and +##' the associated weather data provided by Météo France, the French national +##' meteorological service, using six different monitoring sites. +##' +##' Six different monitoring stations of the Rouen (Haute Normandie, France) +##' area are considered. The urban station \code{jus}, the traffic station +##' \code{gui}, the second most polluted in the region, and \code{gcm} which is +##' located in an industrial area. In Le Havre, are considered the stations +##' \code{rep} (the most polluted in the region) and \code{hri} located at the +##' seaside. Lastly, the station \code{ail} near Dieppe, because it is rural and +##' coastal, and a priori hardly influenced by social and industrial activity. +##' Grouping by categories: \code{jus} and \code{hri} are background urban +##' monitoring sites, \code{gui} and \code{rep} are urban sites close to +##' traffic, \code{gcm} is industrial and \code{ail} is rural. +##' +##' @format Each object is a data frame. +##' +##' The description of the 18 variables is the following (note that for +##' \code{gcm} station, only the pollutant SO2 is available, and there is no +##' pollutant for \code{ail} station): ##' +##' \describe{ +##' \item{PM10}{Daily mean concentration of PM10, in \eqn{\mu}g/m3} +##' \item{NO, NO2, SO2}{Daily mean concentration of NO, NO2 , SO2, in +##' \eqn{\mu}g/m3} +##' \item{T.min, T.max, T.moy}{Daily minimum, maximum and mean temperature, in +##' °C} +##' \item{DV.maxvv, DV.dom}{Daily maximum speed and dominant wind direction, +##' in ° (for wind direction, 0° corresponds to north)} +##' \item{VV.max, VV.moy}{Daily maximum and mean wind speed, in m/s} +##' \item{PL.som}{Daily rainfall, in mm} +##' \item{HR.min, HR.max, HR.moy}{Daily minimum, maximum and mean relative +##' humidity, in \%} +##' \item{PA.moy}{Daily mean air pressure, in hPa} +##' \item{GTrouen, GTlehavre}{Daily temperature gradient, in °C} ##' } ##' -##' @source Weston, J., Elisseff, A., Schoelkopf, B., Tipping, M. (2003), -##' \emph{Use of the zero norm with linear models and Kernel methods}, -##' J. Machine Learn. Res. 3, 1439-1461 -##' -"jus" \ No newline at end of file +##' @source F.-X. Jollois, J.-M. Poggi, B. Portier, \emph{Three non-linear +##' statistical methods to analyze PM10 pollution in Rouen area}. CSBIGS 3(1): +##' 1-17, 2009 +##' +##' @docType data +##' @name PM10 +NULL + +##' @rdname PM10 +"ail" + +##' @rdname PM10 +"gcm" + +##' @rdname PM10 +"gui" + +##' @rdname PM10 +"hri" + +##' @rdname PM10 +"jus" + +##' @rdname PM10 +"rep" \ No newline at end of file diff --git a/R/toys.R b/R/toys.R index bd2a133..9797455 100644 --- a/R/toys.R +++ b/R/toys.R @@ -3,28 +3,31 @@ ##' \code{toys} is a simple simulated dataset of a binary classification ##' problem, introduced by Weston et.al.. ##' -##' It is an equiprobable two class problem, Y belongs to {-1,1}, with six +##' It is an equiprobable two class problem, Y belongs to \{-1,1\}, with six ##' true variables, the others being some noise. ##' The simulation model is defined through the conditional distribution -##' of the Xi for Y=y: +##' of the X_i for Y=y: ##' -##' with probability 0.7, X^j ~ N(yj,1) for j=1,2,3 and +##' \itemize{ +##' \item with probability 0.7, X^j ~ N(yj,1) for j=1,2,3 and ##' X^j ~ N(0,1) for j=4,5,6 ; ##' -##' with probability 0.3, X^j ~ N(0,1) for j=1,2,3 and +##' \item with probability 0.3, X^j ~ N(0,1) for j=1,2,3 and ##' X^j ~ N(y(j-3),1) for j=4,5,6 ; ##' -##' the other variables are noise, X^j ~ N(0,1) +##' \item the other variables are noise, X^j ~ N(0,1) ##' for j=7,\dots,p. +##' } ##' ##' After simulation, the obtained variables are finally standardized. ##' -##' @format The format is a list of 2 component: -##' -##' $x: A data-frame containing input variables: with 100 obs. of 200 -##' variables ; +##' @format The format is a list of 2 components: ##' -##' $y: Output variable: a factor with 2 levels "-1" and "1". +##' \describe{ +##' \item{x}{a dataframe containing input variables: with 100 obs. of 200 +##' variables} +##' \item{y}{output variable: a factor with 2 levels "-1" and "1"} +##' } ##' ##' @examples ##' data(toys) diff --git a/data/ail_comp.rda b/data/ail_comp.rda deleted file mode 100644 index 93c03a6..0000000 Binary files a/data/ail_comp.rda and /dev/null differ diff --git a/data/gcm_comp.rda b/data/gcm_comp.rda deleted file mode 100644 index 4608afe..0000000 Binary files a/data/gcm_comp.rda and /dev/null differ diff --git a/data/gui_comp.rda b/data/gui_comp.rda deleted file mode 100644 index babafae..0000000 Binary files a/data/gui_comp.rda and /dev/null differ diff --git a/data/hri_comp.rda b/data/hri_comp.rda deleted file mode 100644 index e35ad3a..0000000 Binary files a/data/hri_comp.rda and /dev/null differ diff --git a/data/jus_comp.rda b/data/jus_comp.rda deleted file mode 100644 index 4b41219..0000000 Binary files a/data/jus_comp.rda and /dev/null differ diff --git a/data/rep_comp.rda b/data/rep_comp.rda deleted file mode 100644 index bf77cd1..0000000 Binary files a/data/rep_comp.rda and /dev/null differ diff --git a/man/PM10.Rd b/man/PM10.Rd new file mode 100644 index 0000000..1bfc9db --- /dev/null +++ b/man/PM10.Rd @@ -0,0 +1,70 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/PM10.R +\docType{data} +\name{PM10} +\alias{PM10} +\alias{ail} +\alias{gcm} +\alias{gui} +\alias{hri} +\alias{jus} +\alias{rep} +\title{Real-wold data on PM10 pollution in Rouen area, France} +\format{Each object is a data frame. + + The description of the 18 variables is the following (note that for + \code{gcm} station, only the pollutant SO2 is available, and there is no + pollutant for \code{ail} station): + +\describe{ + \item{PM10}{Daily mean concentration of PM10, in \eqn{\mu}g/m3} + \item{NO, NO2, SO2}{Daily mean concentration of NO, NO2 , SO2, in + \eqn{\mu}g/m3} + \item{T.min, T.max, T.moy}{Daily minimum, maximum and mean temperature, in + °C} + \item{DV.maxvv, DV.dom}{Daily maximum speed and dominant wind direction, + in ° (for wind direction, 0° corresponds to north)} + \item{VV.max, VV.moy}{Daily maximum and mean wind speed, in m/s} + \item{PL.som}{Daily rainfall, in mm} + \item{HR.min, HR.max, HR.moy}{Daily minimum, maximum and mean relative + humidity, in \%} + \item{PA.moy}{Daily mean air pressure, in hPa} + \item{GTrouen, GTlehavre}{Daily temperature gradient, in °C} +}} +\source{ +F.-X. Jollois, J.-M. Poggi, B. Portier, \emph{Three non-linear + statistical methods to analyze PM10 pollution in Rouen area}. CSBIGS 3(1): + 1-17, 2009 +} +\usage{ +ail + +gcm + +gui + +hri + +jus + +rep +} +\description{ +These data are TEOM (Tapered Element Oscillating Microbalance) PM10 +concentrations from 2004 to 2006 (1096 days) measured by Air Normand, and +the associated weather data provided by Météo France, the French national +meteorological service, using six different monitoring sites. +} +\details{ +Six different monitoring stations of the Rouen (Haute Normandie, France) +area are considered. The urban station \code{jus}, the traffic station +\code{gui}, the second most polluted in the region, and \code{gcm} which is +located in an industrial area. In Le Havre, are considered the stations +\code{rep} (the most polluted in the region) and \code{hri} located at the +seaside. Lastly, the station \code{ail} near Dieppe, because it is rural and +coastal, and a priori hardly influenced by social and industrial activity. +Grouping by categories: \code{jus} and \code{hri} are background urban +monitoring sites, \code{gui} and \code{rep} are urban sites close to +traffic, \code{gcm} is industrial and \code{ail} is rural. +} +\keyword{datasets} diff --git a/man/jus.Rd b/man/jus.Rd deleted file mode 100644 index db32594..0000000 --- a/man/jus.Rd +++ /dev/null @@ -1,32 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/PM10.R -\docType{data} -\name{jus} -\alias{jus} -\title{TITLE} -\format{The format is a list of 2 component: - -$x: A data-frame containing input variables: with 100 obs. of 200 -variables ; - -$y: Output variable: a factor with 2 levels "-1" and "1".} -\source{ -Weston, J., Elisseff, A., Schoelkopf, B., Tipping, M. (2003), -\emph{Use of the zero norm with linear models and Kernel methods}, -J. Machine Learn. Res. 3, 1439-1461 -} -\usage{ -jus -} -\description{ -DESCRIPTION -} -\examples{ - -##' -\dontrun{ - -} - -} -\keyword{datasets} diff --git a/man/toys.Rd b/man/toys.Rd index 63b7d29..d3d066e 100644 --- a/man/toys.Rd +++ b/man/toys.Rd @@ -4,12 +4,13 @@ \name{toys} \alias{toys} \title{A simulated dataset called toys data} -\format{The format is a list of 2 component: +\format{The format is a list of 2 components: -$x: A data-frame containing input variables: with 100 obs. of 200 -variables ; - -$y: Output variable: a factor with 2 levels "-1" and "1".} +\describe{ + \item{x}{a dataframe containing input variables: with 100 obs. of 200 +variables} + \item{y}{output variable: a factor with 2 levels "-1" and "1"} + }} \source{ Weston, J., Elisseff, A., Schoelkopf, B., Tipping, M. (2003), \emph{Use of the zero norm with linear models and Kernel methods}, @@ -20,19 +21,21 @@ J. Machine Learn. Res. 3, 1439-1461 problem, introduced by Weston et.al.. } \details{ -It is an equiprobable two class problem, Y belongs to {-1,1}, with six +It is an equiprobable two class problem, Y belongs to \{-1,1\}, with six true variables, the others being some noise. The simulation model is defined through the conditional distribution -of the Xi for Y=y: +of the X_i for Y=y: -with probability 0.7, X^j ~ N(yj,1) for j=1,2,3 and +\itemize{ + \item with probability 0.7, X^j ~ N(yj,1) for j=1,2,3 and X^j ~ N(0,1) for j=4,5,6 ; -with probability 0.3, X^j ~ N(0,1) for j=1,2,3 and +\item with probability 0.3, X^j ~ N(0,1) for j=1,2,3 and X^j ~ N(y(j-3),1) for j=4,5,6 ; -the other variables are noise, X^j ~ N(0,1) +\item the other variables are noise, X^j ~ N(0,1) for j=7,\dots,p. +} After simulation, the obtained variables are finally standardized. }