# Linear Regression

## Using the Boston Dataset

In [1]:
using RDatasets

# Load the Boston dataset
ds = dataset("MASS", "Boston");

# The `ds` variable now contains the Boston housing dataset as a DataFrame

In [2]:
feature_names = names(ds)
println(feature_names)

["Crim", "Zn", "Indus", "Chas", "NOx", "Rm", "Age", "Dis", "Rad", "Tax", "PTRatio", "Black", "LStat", "MedV"]


In [3]:
describe(ds)

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Float64,Real,Float64,Real,Int64,DataType
1,Crim,3.61352,0.00632,0.25651,88.9762,0,Float64
2,Zn,11.3636,0.0,0.0,100.0,0,Float64
3,Indus,11.1368,0.46,9.69,27.74,0,Float64
4,Chas,0.06917,0.0,0.0,1.0,0,Int64
5,NOx,0.554695,0.385,0.538,0.871,0,Float64
6,Rm,6.28463,3.561,6.2085,8.78,0,Float64
7,Age,68.5749,2.9,77.5,100.0,0,Float64
8,Dis,3.79504,1.1296,3.20745,12.1265,0,Float64
9,Rad,9.54941,1.0,5.0,24.0,0,Int64
10,Tax,408.237,187.0,330.0,711.0,0,Int64


In [6]:
# Access and print the target variable column ('MedV')
target = ds[:, :MedV];
println(target[1:10]) # Print 10 first

[24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9]


&#x1F4CB; Note:

```julia
using DataFrames
```

The dataset is already a DataFrame with columns named, but for demonstration:
If you had separate data and column names, you could create a DataFrame like this

```julia
df = DataFrame(ds.data, Symbol.(ds.feature_names))
```

The above line suggested by ChatGPT is not working instead let's just use `df` (dataframe)  as an alias by being a reference to `ds`

In [7]:
df = ds

# Display the first few rows of the DataFrame
first(df, 5)

Row,Crim,Zn,Indus,Chas,NOx,Rm,Age,Dis,Rad,Tax,PTRatio,Black,LStat,MedV
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Int64,Int64,Float64,Float64,Float64,Float64
1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
3,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
5,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [12]:
? RDatasets # Help

search: [0m[1mR[22m[0m[1mD[22m[0m[1ma[22m[0m[1mt[22m[0m[1ma[22m[0m[1ms[22m[0m[1me[22m[0m[1mt[22m[0m[1ms[22m



No docstring found for module `RDatasets`.

# Exported names

`AbstractDataFrame`, `All`, `AsTable`, `Between`, `ByRow`, `Cols`, `DataFrame`, `DataFrameRow`, `DataFrames`, `GroupedDataFrame`, `InvertedIndex`, `InvertedIndices`, `Missing`, `MissingException`, `Missings`, `Not`, `PrettyTables`, `SubDataFrame`, `Tables`, `aggregate`, `allcombinations`, `allowmissing`, `allowmissing!`, `antijoin`, `by`, `coalesce`, `colmetadata`, `colmetadata!`, `colmetadatakeys`, `columnindex`, `combine`, `completecases`, `crossjoin`, `dataset`, `delete!`, `deletecolmetadata!`, `deletemetadata!`, `describe`, `disallowmissing`, `disallowmissing!`, `dropmissing`, `dropmissing!`, `emptycolmetadata!`, `emptymetadata!`, `emptymissing`, `fillcombinations`, `flatten`, `groupby`, `groupcols`, `groupindices`, `innerjoin`, `insertcols`, `insertcols!`, `ismissing`, `leftjoin`, `leftjoin!`, `levels`, `mapcols`, `mapcols!`, `metadata`, `metadata!`, `metadatakeys`, `missing`, `missings`, `ncol`, `nonmissingtype`, `nonunique`, `nrow`, `order`, `outerjoin`, `passmissing`, `proprow`, `rename`, `rename!`, `repeat!`, `rightjoin`, `rownumber`, `select`, `select!`, `semijoin`, `skipmissings`, `subset`, `subset!`, `transform`, `transform!`, `unique!`, `unstack`, `valuecols`

# Displaying contents of readme found at `/Users/valiha/.julia/packages/RDatasets/fNG6F/README.md`

# RDatasets.jl

[![Build status](https://github.com/JuliaStats/RDatasets.jl/workflows/CI/badge.svg)](https://github.com/JuliaStats/RDatasets.jl/actions?query=workflow%3ACI+branch%3Amaster)

The RDatasets package provides an easy way for Julia users to experiment with most of the standard data sets that are available in the core of R as well as datasets included with many of R's most popular packages. This package is essentially a simplistic port of the Rdatasets repo created by Vincent Arelbundock, who conveniently gathered data sets from many of the standard R packages in one convenient location on GitHub at https://github.com/vincentarelbundock/Rdatasets

In order to load one of the data sets included in the RDatasets package, you will need to have the `DataFrames` package installed. This package is automatically installed as a dependency of the `RDatasets` package if you install `RDatasets` as follows:

```
Pkg.add("RDatasets")
```

After installing the RDatasets package, you can then load data sets using the `dataset()` function, which takes the name of a package and a data set as arguments:

```
using RDatasets
iris = dataset("datasets", "iris")
neuro = dataset("boot", "neuro")
```

# Data Sets

The `RDatasets.packages()` function returns a table of represented R packages:

|      Package |                                                                         Title |
| ------------:| -----------------------------------------------------------------------------:|
|        COUNT |                                      Functions, data and code for count data. |
|        Ecdat |                                                    Data sets for econometrics |
|        HSAUR |                      A Handbook of Statistical Analyses Using R (1st Edition) |
|     HistData |               Data sets from the history of statistics and data visualization |
|         ISLR |       Data for An Introduction to Statistical Learning with Applications in R |
|       KMsurv |               Data sets from Klein and Moeschberger (1997), Survival Analysis |
|         MASS |                 Support Functions and Datasets for Venables and Ripley's MASS |
|     SASmixed |                                  Data sets from "SAS System for Mixed Models" |
|        Zelig |                                               Everyone's Statistical Software |
| adehabitatLT |                                                  Analysis of Animal Movements |
|         boot |                        Bootstrap Functions (Originally by Angelo Canty for S) |
|          car |                                               Companion to Applied Regression |
|      cluster |                                    Cluster Analysis Extended Rousseeuw et al. |
|     datasets |                                                        The R Datasets Package |
|       gamair | Datasets used in the book Generalized Additive Models: An Introduction with R |
|          gap |                                                      Genetic analysis package |
|      ggplot2 |                                  An Implementation of the Grammar of Graphics |
|      lattice |                                                              Lattice Graphics |
|         lme4 |                                Linear mixed-effects models using Eigen and S4 |
|         mgcv |         Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation |
|       mlmRev |                            Examples from Multilevel Modelling Software Review |
|        nlreg |                   Higher Order Inference for Nonlinear Heteroscedastic Models |
|          plm |                                                  Linear Models for Panel Data |
|         plyr |                              Tools for splitting, applying and combining data |
|         pscl |               Political Science Computational Laboratory, Stanford University |
|        psych |          Procedures for Psychological, Psychometric, and Personality Research |
|     quantreg |                                                           Quantile Regression |
|     reshape2 |                       Flexibly Reshape Data: A Reboot of the Reshape Package. |
|   robustbase |                                                       Basic Robust Statistics |
|        rpart |                                   Recursive Partitioning and Regression Trees |
|     sandwich |                                           Robust Covariance Matrix Estimators |
|          sem |                                                    Structural Equation Models |
|     survival |                                                             Survival Analysis |
|          vcd |                                                  Visualizing Categorical Data |

The `RDatasets.datasets()` function returns a table describing the 700+ included datasets. Or pass in a package name (e.g. `RDatasets.datasets("mlmRev")`) for a targeted table:

| Package |       Dataset |                                                 Title |  Rows | Columns |
| -------:| -------------:| -----------------------------------------------------:| -----:| -------:|
|  mlmRev |        Chem97 |                   Scores on A-level Chemistry in 1997 | 31022 |       8 |
|  mlmRev | Contraception |                       Contraceptive use in Bangladesh |  1934 |       6 |
|  mlmRev |         Early |                    Early childhood intervention study |   309 |       4 |
|  mlmRev |          Exam |                         Exam scores from inner London |  4059 |      10 |
|  mlmRev |        Gcsemv |                                       GCSE exam score |  1905 |       5 |
|  mlmRev |         Hsb82 |                         High School and Beyond - 1982 |  7185 |       8 |
|  mlmRev |         Mmmec |                   Malignant melanoma deaths in Europe |   354 |       6 |
|  mlmRev |        Oxboys |                             Heights of Boys in Oxford |   234 |       4 |
|  mlmRev |      ScotsSec |                      Scottish secondary school scores |  3435 |       6 |
|  mlmRev |           bdf |       Language Scores of 8-Graders in The Netherlands |  2287 |      28 |
|  mlmRev |      egsingle |                           US Sustaining Effects study |  7230 |      12 |
|  mlmRev |       guImmun |                             Immunization in Guatemala |  2159 |      13 |
|  mlmRev |      guPrenat |                            Prenatal care in Guatemala |  2449 |      15 |
|  mlmRev |          star | Student Teacher Achievement Ratio (STAR) project data | 26796 |      18 |

# Licensing and Intellectual Property

Following Vincent's lead, we have assumed that all of the data sets in this repository can be made available under the GPL-3 license. If you know that one of the datasets released here should not be released publicly or if you know that a data set can only be released under a different license, please contact me so that I can remove the data set from this repository.
