szlevente/DASI-Project
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
<!-- Make sure that the knitr package is installed and loaded. --> <!-- For more info on the package options see http://yihui.name/knitr/options --> ### Religiosity and formal education <!-- The code required to load the data--> ```{r echo=FALSE} load(url("http://bit.ly/dasi_gss_data")) ``` ### Introduction: In this report, we try to answer the question whether there is a connection between someone's highest level of formal education and their religiosity (as measured by how often they attend religious services) It has often been stated that having a higher formal education correlates with lower church attendance. Using the GSS (General Social Survey) data, we can assert the correctness of this affirmation. ### Data: The General Social Survey (GSS) is a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. The survey is conducted face-to-face with an in-person interview by the National Opinion Research Center at the University of Chicago, of adults (18+) in randomly selected households. The survey was conducted every year from 1972 to 1994 (except in 1979, 1981, and 1992). Since 1994, it has been conducted every other year. The survey takes about 90 minutes to administer. (Source: Wikipedia) The cases in this study are adult (18+) residents of the United States. Two variables are of interest in this study, namely degree - respondents highest degree - a categorical variable ( less than high school, high school, junior college, bachelor, graduate) and "attend" - how often does the respondent attend religious services, also a categorical variable, with values such as never, once a year, once a week, etc The GSS study is an observational study. The collection of the data did not directly interfere with the way the data arose. People were randomly selected for surveying, but there was no control group. Results of the survey were recorded, but not compared against a control group. The population of interest is the adult population of the United States. As the study is done on a representative random sample, the findings from the analysis can be generalized to the entire adult population of the US. Potential sources of bias at the collection of the data might include - non-response bias, if a non-random subsection of the respondents chose not to answer the survey, and conveniance bias, if people who were easy to sample were more likely to be included in the sample. Since the study is an observational study and not an experiment, causal links cannot be determined, just correlations (associations). To be able to determine causality, an experiment , a control group and a "treatment" is needed (for example, a double blind, placebo controlled experiment), which is not the case in this study. ### Exploratory data analysis: ```{r} plot(gss$attend ~ gss$degree) by(gss$attend, gss$degree, summary) ``` A plot showing the proportion of religious services attendance across each educational degree shows that the proportions are somewhat similar - suggesting that there might be differences between attendance as formal degree changes, however, more precise measuments need to be performed in order to draw appropriate conclusions. ### Inference: The null hypothesis is that there are no differences in the proportions of attendace as formal education degree changes. In other words, religious service attendance is independent of the formal educational degree obtained. The alternative hypothesis is that the two variables are dependent of each other, there is a correspondence (however, causality still cannot be determined). We are going to choose a 5% significance level, and apply the chi-squared test of independence. Conditions for the chi-square test: * Independence - sampled observations must be independent - fulfilled by the data collection method, according to GSS guidelines (random sample) * Number of cases is less than 10% of the population - GSS surveys about 55,000 respondents * Sample size - each cell has at least 5 expected cases (in fact, in our particular case we have way more than that in each cell) In the chi-square test, we are going to determine how different the expected values are from the observed values. If these differences are large, we can conclude that these deviations are unlikely to occur due to chance alone and therefore it might provide evidence for the alternative hypothesis. Based on the data above, we can proceed to clean and group the variables. If we eliminate the NA's and group attendance into two categories (for convenience) : those who attend less than once a week, and those who attend at least once a week, we get to the following table: | Less than once a week | At least once a week | Totals -------------| ---------------------|-----------------------|--- Lt High school | 6234 | 3145| 9379 High school | 16371 | 7908|24279 Junior college | 1774|836|2610 Bachelor |4395|2379|6774 Graduate |2074|1151|3225 We now have two categorical variables, one of which has two levels, and the other one has 5 level. We are going to apply a hypothesis test only (no confidence interval), and since the expected size condition is met, no simulation is needed. Degrees of freedom df = 4 x 1 = 4, the test statistic is 26.29 , and the p value is ```{r} chisq.test(data.frame(c(6234,16371,1774,4395,2074),c(3145,7908,836,2379,1151))) pchisq(26.29, df=4,lower.tail=FALSE) ``` Since no other methods are applicable we cannot compare whether the hypothesis test and the confidence interval agree. ### Conclusion: Based on the p value of 2.764e-05 we can reject the null hypothesis at the 5% significance level. Therefore, we can conclude that the data does present conclusive evidence that there is correspondence in the proportions of attendance as formal educational degree changes. At his significance level, religious service attendance does seem to depend on formal educational degree. The differences between observed and expected attendance in the sample can not be simply attributed to chance. Other possible areas of study might include taking into account the respondent's IQ instead of formal educational degree, as IQ tends to measure native intelligence, while formal education depends on a lot of factors, such as environmental factors, family background, social status and so on. Such studies have been done in the past (see referenced Wikipedia article). Admittedly however, introducing IQ measurement into the GSS might prove very difficult and possibly impractical. ### References: Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1 Persistent URL: http://doi.org/10.3886/ICPSR34802.v1 Wikipedia: http://en.wikipedia.org/wiki/General_Social_Survey Wikipedia: http://en.wikipedia.org/wiki/Religiosity_and_intelligence ### Appendix: ``` year age degree attend 1 1972 23 Bachelor Once A Year 2 1972 70 Lt High School Every Week 3 1972 48 High School Once A Month 4 1972 27 Bachelor <NA> 5 1972 61 High School <NA> 6 1972 26 High School Once A Year 7 1972 28 High School Every Week 8 1972 27 Bachelor <NA> 9 1972 21 High School Sevrl Times A Yr 10 1972 30 High School More Thn Once Wk 11 1972 30 High School Every Week 12 1972 56 Lt High School Every Week 13 1972 54 Lt High School Every Week 14 1972 49 Lt High School Every Week 15 1972 41 Lt High School More Thn Once Wk 16 1972 54 High School 2-3X A Month 17 1972 24 High School Every Week 18 1972 62 Lt High School Every Week 19 1972 46 Bachelor Every Week 20 1972 21 High School 2-3X A Month 21 1972 57 High School 2-3X A Month 22 1972 58 High School Once A Year 23 1972 21 High School Lt Once A Year 24 1972 26 High School <NA> 25 1972 71 Bachelor Sevrl Times A Yr 26 1972 53 High School Every Week 27 1972 42 High School Once A Year 28 1972 42 High School Once A Year 29 1972 20 High School Sevrl Times A Yr 30 1972 23 Lt High School Every Week ```
About
Data Analysis and Statistical Inference project
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published