## Create a data frame from the csv file.

**read.table** - Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file.

### List of some arguments
**read.table*(**file**, **header** = FALSE, **sep** = ",", 
                **dec** = ".", row.names, col.names,
                **na.strings** = "NA", nrows = -1,
                **skip** = 0, **comment.char** = "#", fileEncoding = "", encoding = "unknown")
                
                
    file: the name of the file which the data are to be read from.
          Each row of the table appears as one line of the file.  If it
          does not contain an _absolute_ path, the file name is
          _relative_ to the current working directory, ‘getwd()’.
          Tilde-expansion is performed where supported.  This can be a
          compressed file (see ‘file’).

    header: a logical value indicating whether the file contains the
          names of the variables as its first line.  If missing, the
          value is determined from the file format: ‘header’ is set to
          ‘TRUE’ if and only if the first row contains one fewer field
          than the number of columns.
          
    sep: the field separator character.  Values on each line of the
          file are separated by this character.  If ‘sep = ""’ (the
          default for ‘read.table’) the separator is ‘white space’,
          that is one or more spaces, tabs, newlines or carriage
          returns.

    quote: the set of quoting characters. To disable quoting altogether,
          use ‘quote = ""’.  See ‘scan’ for the behaviour on quotes
          embedded in quotes.  Quoting is only considered for columns
          read as character, which is all of them unless ‘colClasses’
          is specified.

    dec: the character used in the file for decimal points.

    numerals: string indicating how to convert numbers whose conversion to
          double precision would lose accuracy, see ‘type.convert’.
          Can be abbreviated.  (Applies also to complex-number inputs.)

    row.names: a vector of row names.  This can be a vector giving the
          actual row names, or a single number giving the column of the
          table which contains the row names, or character string
          giving the name of the table column containing the row names.

          If there is a header and the first row contains one fewer
          field than the number of columns, the first column in the
          input is used for the row names.  Otherwise if ‘row.names’ is
          missing, the rows are numbered.

          Using ‘row.names = NULL’ forces row numbering. Missing or
          ‘NULL’ ‘row.names’ generate row names that are considered to
          be ‘automatic’ (and not preserved by ‘as.matrix’).

    col.names: a vector of optional names for the variables.  The default
          is to use ‘"V"’ followed by the column number.


    nrows: integer: the maximum number of rows to read in.  Negative and
          other invalid values are ignored.

    skip: integer: the number of lines of the data file to skip before
          beginning to read data.
    
    comment.char: character: a character vector of length one containing a
          single character or an empty string.  Use ‘""’ to turn off
          the interpretation of comments altogether.

    stringsAsFactors: logical: should character vectors be converted to
          factors?  Note that this is overridden by ‘as.is’ and
          ‘colClasses’, both of which allow finer control.

    fileEncoding: character string: if non-empty declares the encoding used
          on a file (not a connection) so the character data can be
          re-encoded.  See the ‘Encoding’ section of the help for
          ‘file’, the ‘R Data Import/Export Manual’ and ‘Note’.

    encoding: encoding to be assumed for input strings.  It is used to mark
          character strings as known to be in Latin-1 or UTF-8 (see
          ‘Encoding’): it is not used to re-encode the input, but
          allows R to handle encoded strings in their native encoding
          (if one of those two).  See ‘Value’ and ‘Note’.

    text: character string: if ‘file’ is not supplied and this is, then
          data are read from the value of ‘text’ via a text connection.
          Notice that a literal string can be used to include (small)
          data sets within R code.

    skipNul: logical: should nuls be skipped?

-----------------------------------------------------------------------------------------------

**read.table*(**file**, **header** = FALSE, **sep** = ",", 
                **dec** = ".", row.names, col.names,
                **na.strings** = "NA", nrows = -1,
                **skip** = 0, **comment.char** = "#", fileEncoding = "", encoding = "unknown")
           

In [None]:
data<-read.table("data/root_length.csv",sep=";",dec=",",header=TRUE)

### General analysis of the data

#### Head function
Returns the first of a vector, matrix, table, data frame or function.

#### Tail 
Returns the last parts of a vector, matrix, table , data frame or function

In [None]:
head(data)
tail(data)

#### Summary function 
 **summary** is a generic function used to produce result summaries.

In [None]:
summary(data)

#### Table function
**table** uses the cross-classifying factors to build a contingency
     table of the counts at each combination of factor levels.
     
*Let's show it for the number of lateral roots*

In [None]:
table(data$Lat_roots)

#### Hist function
The generic function **hist** computes a histogram of the given data values.

You can define the number of breaks

    breaks: one of:

            • a vector giving the breakpoints between histogram cells,

            • a function to compute the vector of breakpoints,

            • a single number giving the number of cells for the
              histogram,

            • a character string naming an algorithm to compute the
              number of cells (see ‘Details’),

            • a function to compute the number of cells.

          In the last three cases the number is a suggestion only; as
          the breakpoints will be set to ‘pretty’ values, the number is
          limited to ‘1e6’ (with a warning if it was larger).  If
          ‘breaks’ is a function, the ‘x’ vector is supplied to it as
          the only argument (and the number of breaks is only limited
          by the amount of available memory).

*We will define explictly to have a break for each value in the contingency table.*

In [None]:
num_of_breaks<-length(table(data$Lat_roots))
cat("# of breaks:",num_of_breaks)

table(data$Lat_roots)
hist(data$Lat_roots,breaks = num_of_breaks)


#### Simple pie plot 
    pie(x, labels = names(x), edges = 200, radius = 0.8,
             clockwise = FALSE, init.angle = if(clockwise) 90 else 0,
             density = NULL, angle = 45, col = NULL, border = NULL,
             lty = NULL, main = NULL, ...)
             
We create a new variable called data.group and "cut" into the levels given in the vector

    cut(x, breaks, labels = NULL,
         include.lowest = FALSE, right = TRUE, dig.lab = 3,
         ordered_result = FALSE, ...)
    
       x: a numeric vector which is to be converted to a factor by
          cutting.

       breaks: either a numeric vector of two or more unique cut points or a
          single number (greater than or equal to 2) giving the number
          of intervals into which ‘x’ is to be cut.

       labels: labels for the levels of the resulting category.  By default,
          labels are constructed using ‘"(a,b]"’ interval notation.  If
          ‘labels = FALSE’, simple integer codes are returned instead
          of a factor.



In [None]:
LEVELS=c(0,5,10,100)

data.group <- cut( 
	data$Lat_roots, 
	LEVELS) 
data.group

data.contingency_table <- table(data.group)
data.contingency_table

pie(data.contingency_table)

## Manipulating the imported data frame

### Create a column called lateralization_factor based on the "cuts" of lateral_roots

In [None]:
LEVELS=c(0,5,10,20)
LABELS=c("Low","Medium","High")

data$lateralization_factor <- factor( 
	cut( data$Lat_roots, LEVELS ), 
	labels=LABELS
);

#### Showing the result

In [None]:
head(data)

### Calculating the mean of length per row

In [None]:
data$length_mean <- rowMeans(data[,2:4])

In [None]:
head(data)

### Adding a column with the standard deviation

sd(x) calculates the standard deviation of the given vector x

Since we had a matrix of data and we want to apply **sd()** to each row we mas use **apply()**

#### Function apply()
    
     Returns a vector or array or list of values obtained by applying a
     function to margins of an array or matrix.

Usage:

     apply(X, MARGIN, FUN, ...)
     
Arguments:

       X: an array, including a matrix.

    MARGIN: a vector giving the subscripts which the function will be
          applied over.  E.g., for a matrix ‘1’ indicates rows, ‘2’
          indicates columns, ‘c(1, 2)’ indicates rows and columns.
          Where ‘X’ has named dimnames, it can be a character vector
          selecting dimension names.

     FUN: the function to be applied: see ‘Details’.  In the case of
          functions like ‘+’, ‘%*%’, etc., the function name must be
          backquoted or quoted.

     ...: optional arguments to ‘FUN’.



#### So we use: 
    
    x: the matrix data[,2:4]
    
    Margin: Since we want to apply to each row '1'
    
    FUN: The function we want to apply is 'sd'


In [None]:
data$sd<-apply(data[,2:4],1, sd)

In [None]:
head(data)

### Saving the data frame to a file that can be imported in excel
**write.csv** prints its required argument ‘x’ (after converting it to a data frame if it is not one nor a matrix) to a file or connection.

**write.csv()** is a shortcut to **write.table()** with **dec**,**sep** hardcoded

**write.table**(**x**, **file** = "", **append** = FALSE, **quote** = TRUE, **sep** = ",",
                 **eol** = "\n", **na** = "NA", **dec** = ".", **row.names** = TRUE,
                 **col.names** = TRUE, **fileEncoding** = "")
                 
    x: the object to be written, preferably a matrix or data frame.
          If not, it is attempted to coerce ‘x’ to a data frame.
 
    file: either a character string naming a file or a connection open
          for writing.  ‘""’ indicates output to the console.

    append: logical. Only relevant if ‘file’ is a character string.  If
          ‘TRUE’, the output is appended to the file.  If ‘FALSE’, any
          existing file of the name is destroyed.

    quote: a logical value (‘TRUE’ or ‘FALSE’) or a numeric vector.  If
          ‘TRUE’, any character or factor columns will be surrounded by
          double quotes.  If a numeric vector, its elements are taken
          as the indices of columns to quote.  In both cases, row and
          column names are quoted if they are written.  If ‘FALSE’,
          nothing is quoted.

     sep: the field separator string.  Values within each row of ‘x’
          are separated by this string.

     eol: the character(s) to print at the end of each line (row).  For
          example, ‘eol = "\r\n"’ will produce Windows' line endings on
          a Unix-alike OS, and ‘eol = "\r"’ will produce files as
          expected by Excel:mac 2004.

      na: the string to use for missing values in the data.

     dec: the string to use for decimal points in numeric or complex
          columns: must be a single character.

    row.names: either a logical value indicating whether the row names of
          ‘x’ are to be written along with ‘x’, or a character vector
          of row names to be written.

    col.names: either a logical value indicating whether the column names
          of ‘x’ are to be written along with ‘x’, or a character
          vector of column names to be written.  See the section on
          ‘CSV files’ for the meaning of ‘col.names = NA’.

    fileEncoding: character string: if non-empty declares the encoding to
          be used on a file (not a connection) so the character data
          can be re-encoded as they are written.  See ‘file’.


In [None]:
#write.csv(data,file="data/new_root_length.csv")

# Exercices

1. Create a new column(lat_roots_factorized) with the factors "CLOSE_TO_NONE", "SLIGHTLY", "SO_SO", "SOME_WHAT", "MOSTLY" applied to the number of lateral roots. 

2. Create a new column(above_mean) with the boolean ("TRUE" or "FALSE")
   - TRUE -> If the mean length of the individual is above the length mean of the dataset
   - FALSE -> if not.

   3. Round all the numbers in your dataframe (data) to only 3 decimal values

4 Create a new column(cof_varience) with the cof_varience of the lengths of each individual
 - **Tip** -  you can use the apply and myCofV functions 
 -  **P.S.** Don't for get to inport the second function from "my-functions.R" in the root folder 