# Descriptive statistics in R. Example 2
## Nominal data example
**Task 5.** "Multitabs". In studies on the use of the complex of vitamins and minerals "Multitabs" for the prevention of influenza, the main group taking the drug "Multitabs" was 160 adults. The control group (not taking the drug) included 100 people. The incidence analysis of the influenza during the epidemic showed 5 people got sick in the main group, 9 - in the control group. Calculate the proportions and confidence interval of the samples propotions.

### 1.	Determine the variable type
Because the variable’s value spectrum is finite it’s nominal type of data.
### 2.	Input the data in R.
Prepare the data in CSV (comma-separated value) format to become a matrix. Use the `read.csv` function

In [1]:
Input = ("
         Case, Main Group, Control Group
         Disease, 5, 9
         Healthy, 155, 91"
        )
Data = as.data.frame(read.csv(textConnection(Input), header = TRUE, row.names = 1))
Data

Unnamed: 0,Main.Group,Control.Group
Disease,5,9
Healthy,155,91


### 2.	Calculate the proportions
To get the proportions for the groups in `Data` one has to convert `Data` into a matrix and use it as an argument in a `prop.table` function with additional argument `margin = 2`. Setting `margin` to 2 tells function to calculate proportions relative to the sum of columns in the source matrix which translates in cases count for each group.

In [2]:
Data.prop = prop.table(as.matrix(Data), margin = 2)
Data.prop

Unnamed: 0,Main.Group,Control.Group
Disease,0.03125,0.09
Healthy,0.96875,0.91


### 3.	Calculate Confidence Intervals for the proportions
For the confidence intervals one has to perform one-sample `binom.test` function on each of the groups.

In [3]:
mg_summary = binom.test(Data$Main.Group[1], sum(Data$Main.Group))
mg_summary


	Exact binomial test

data:  Data$Main.Group[1] and sum(Data$Main.Group)
number of successes = 5, number of trials = 160, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.01022314 0.07141770
sample estimates:
probability of success 
               0.03125 


In [4]:
cg_summary = binom.test(Data$Control.Group[1], sum(Data$Control.Group))
cg_summary


	Exact binomial test

data:  Data$Control.Group[1] and sum(Data$Control.Group)
number of successes = 9, number of trials = 100, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.0419836 0.1639823
sample estimates:
probability of success 
                  0.09 


### 4.	Gathering the results
Let’s gather the results we’ve obtained on the previous steps in single place. We’ll construct two lists with the `Proportion`, `lwr.CI`, `upr.CI` values corresponding to the *proportion* value, *lower confidence interval bound*, *upper confidence interval bound*.

In [5]:
cg_summary = list(Proportion = cg_summary$estimate, 
                  lwr.CI = cg_summary$conf.int[1], 
                  upr.CI = cg_summary$conf.int[2])
mg_summary = list(Proportion = mg_summary$estimate, 
                  lwr.CI = mg_summary$conf.int[1], 
                  upr.CI = mg_summary$conf.int[2])

Final data is being constructed by column combine of both control group and main group lists via `cbind` function.

In [6]:
Data.DescrStat = as.data.frame(cbind(Control.Group = cg_summary, Main.Group = mg_summary))
Data.DescrStat

Unnamed: 0,Control.Group,Main.Group
Proportion,0.09,0.03125
lwr.CI,0.0419836,0.01022314
upr.CI,0.1639823,0.0714177


### 5.	Interpreting data
Interpretation has to be started from the proportion comparison and afterwards assess if the confidence intervals of the groups are intersecting. If they are, then the conclusion has to be formulated as a hypothesis. Otherwise one can imply the conclusion, although we would suggest to make the necessary hypotheses testing to make the calculations rigorous and robust.

 As one can see the Control Group has higher proportion of the ill people than the Main Group. This evidences that taking proposed drug is beneficial for immune system to resist influenza.
The confidence intervals (CI) for the groups are intersecting (the upper bound of CI for the Main Group is inside of the CI of the Control Group) which means we cannot imply our finding without formulating and testing the statistical hypotheses, but we can formulate the conclusion in a form of the medical hypothesis.

### Conclusion:
Taking drug can help prevent influenza for almost 3 times better than without it.

**Note:** the word “can” in the conclusion – is just one of the many ways to formulate the conclusion as a hypothesis.