-
Notifications
You must be signed in to change notification settings - Fork 7
/
Exploring_one_variable.Rmd
94 lines (70 loc) · 2.2 KB
/
Exploring_one_variable.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
title: "Exploring one Variable"
author: "Shankar"
date: "October 1, 2016"
output:
html_document:
keep_md: true
---
This is an R Markdown document that I used to learn Data Analysis from udacity which explorees one variable
```{r}
fbdata <- read.csv("https://s3.amazonaws.com/udacity-hosted-downloads/ud651/pseudo_facebook.tsv", sep = '\t')
```
Histogram of birthdays
```{r}
knitr::opts_chunk$set(fig.path = "README_figs/README-")
library(ggplot2)
names(fbdata)
qplot(dob_day,data = fbdata) +
scale_x_discrete(breaks= 1:31) +
facet_wrap(~dob_month, ncol=3)
```
Friend count by gender
```{r}
#qplot(friend_count, data=fbdata, xlim = c(0,1000))
qplot(friend_count, data=subset(fbdata,!is.na(gender)), binwidth=25)+
scale_x_continuous(limits=c(0,1000), breaks= seq(0,1000,50))+
facet_wrap(~gender)
```
Table of gender count
```{r}
table(fbdata$gender)
```
summary of friend count by gender
```{r}
by(fbdata$friend_count,fbdata$gender, summary)
```
Scaling and Transoforming variabels
```{r}
p1= qplot(friend_count,data=subset(fbdata,!is.na(gender)))+facet_wrap(~gender)
p2= p1 + scale_x_log10() +xlab("Log Transformed")
p3=p1+ scale_x_sqrt() + xlab("squared")
library(gridExtra)
grid.arrange(p1,p2,p3, ncol=1)
```
Freequency ploygons
```{r}
ggplot(aes(x = friend_count, y = ..count../sum(..count..)), data = subset(fbdata, !is.na(gender))) +
geom_freqpoly(aes(color = gender), binwidth=10) +
scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) +
xlab('Friend Count') +
ylab('Percentage of users with that friend count')
```
```{r}
ggplot(aes(x = www_likes, y = ..count../sum(..count..)), data = subset(fbdata, !is.na(gender))) +
geom_freqpoly(aes(color = gender)) +
scale_x_log10()
```
Box plot of friend count by gender
outliers are 1.5 times IQR
```{r}
qplot(x=gender,y= friend_count,data = subset(fbdata,!is.na(gender)),
geom = 'boxplot', ylim = c(0,1000))
qplot(x=gender,y= friend_count,data = subset(fbdata,!is.na(gender)),
geom = 'boxplot')+ coord_cartesian(ylim = c(0,1000))
```
Summary of Suedo facebook data
```{r}
summary(fbdata)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.