Skip to content

Gkontopodis/Regression-Modelling-in-Practice

Repository files navigation

Regression Modelling in Practice

This is an introduction for the third course of Data Analysis and Interpretation Specialization, offered by Wesleyan University through Coursera. For the grading purposes of the course, the assignments were initially uploaded on Tumblr.

This course focuses on one of the most important tools in your data analysis arsenal: regression analysis. It is an attempt to examine multiple predictors, identify confounding variables, interpret regression coefficients and use regression diagnostic plots, working on existing data (U.S. National Epidemiological Survey on Alcohol and Related Conditions-NESARC).

For the code and the output i used Spyder (IDE). Requires python 2.7+.

Course 4-Week Syllabus

  1. Introduction to Regression
  2. Basics of Linear Regression
  3. Multiple Regression
  4. Logistic Regression

Sample

The data was provided by the National Epidemiological Survey on Alcohol and Related Conditions (NESARC), which was conducted in a random sample of 43,093 U.S. adults and designed to determine the magnitude of alcohol use and psychiatric disorders. Sample size is important because the larger the sample size, the more accurate the findings. NESARC’s unusually large sample size also made it possible to achieve stable estimates of even rare conditions. NESARC participants came from all walks of life and a variety of ages, and the level of analysis studied was individual. They represented all regions of the United States and included residents of the District of Columbia, Alaska, and Hawaii. In addition to sampling individuals living in traditional households, NESARC investigators questioned military personnel living off base and people living in a variety of group accommodations such as boarding or rooming houses and college quarters. More specifically, the sample consists of 24,575 (57.1%) males and 18,518 (42.9%) females, among of whom 9,535 (22.13%) were aged between 18 and 30 years old. The data analytic subset, examined in this study, includes individuals aged between 18 and 30 years old who reported using cannabis at least once in their life (N=2,412).

Procedure

In 2001—2002, the National Institute on Alcohol Abuse and Alcoholism (NIAAA) conducted the first wave of the National Epidemiological Survey on Alcohol and Related Conditions (NESARC), the largest and most ambitious survey of this type conducted to date. Information was collected in face-to-face computer-assisted interviews, which took place in the participants’ homes. It contained an extensive battery of questions about present and past alcohol consumption, AUDs, and the use of alcohol treatment services. NESARC also included similar questions related to tobacco and illicit drug use (including nicotine dependence and drug use disorders), as well as questions designed to determine a wide variety of psychiatric disorders such as major depression, anxiety disorders, and personality disorders. The original purpose of this survey was to evaluate the magnitude and have a better understanding of the link between alcohol use and other drug use and/or psychiatric disorders, which can help treatment providers design more targeted screening and more effective treatments for their patients. The response rate was 81%, which is significantly high compared to the standards of recent large-scale national surveys. A high response rate is very important, since it is key to legitimizing the results of the survey.