Skip to content
Retrieving, Processing, and Visualizing Data with Python
Python Other
  1. Python 99.9%
  2. Other 0.1%
Branch: master
Clone or download
Niam Moltta
Niam Moltta Update README.md
Latest commit 358679f May 1, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Specialization Update ReadME.md Sep 29, 2017
ANowoa.py Update ANowoa.py Oct 11, 2017
Boxy.py Add files via upload Sep 29, 2017
ConverS.py Add files via upload Sep 29, 2017
DepT-test.py Add files via upload Oct 20, 2017
LICENSE Update LICENSE Sep 29, 2017
MoCT.py Add files via upload Oct 20, 2017
PeaR.py Add files via upload Sep 29, 2017
README.md Update README.md May 1, 2018
RangeR.py Add files via upload May 1, 2018
SkewU.py Add files via upload Oct 20, 2017
SpeaR.py Add files via upload Sep 29, 2017
Stan.py Add files via upload Sep 29, 2017

README.md

Gender and the environment in Mexico

Capstone Project for Coursera Specialization: "Retrieving, Processing, and Visualizing Data with Python".

I downloaded data files from http://www.inegi.org.mx/, and I designed a variable called ProAmbiente (Pro-environment), based on each home's level of consumption of products that generate pollution and the frequency in which they invested on repairments instead of throwing them away. I wrote Python programs to read, extract, analyze and visualize that data, in a way that anyone can use them for their own purposes, by entering the name of their own files.

Visualizing data:

Gender of person who supports economically the house (sexo_jefe):

  1. Male
  2. Female

    Education (educa_jefe): From zero to masters/graduate completed (0-11).
  • Keys (originals): 0) Nada, 1) Kinder, 2) Primaria (trunca), 3) Primaria (terminada), 4) Secundaria (trunca), 5) Secundaria (terminada), 6) Preparatoria (trunca), 7) Preparatoria (terminada), 8) Carrera técnica (terminada), 9) Licenciatura (trunca), 10) Licenciatura (terminada), 11) Posgrado (terminado).

Socioeconomic class (est_socio: determined by each home's physical properties): From Low to High (1-4).
1. Baja (Low) 2. Media baja (Lower middle class) 3. Media alta (Upper middle class) 4. Alta (High)


* Other interesting results (total_int = members per home):



The main INSTRUCTIONS for the programs are very simple:

  • The files must be in the same folder of the scripts.
  • Select file.
  • Select column header.
  • Select alpha (if applies).
  • Enter 'ya' to quit.
  • etc.

All the programs have the same structure so you can use the same keywords to start/proceed/quit.

  • MoCT.py

    • Returns Measures of Central Tendency:
    • N, mean, standard deviation, standard error, etc.
    • Returns sampling distribution graph
    • Returns z-value and p-value from z-table
    • Returns z-score
    • Calculates One tailed T-test
    • Returns confidence interval
    • Returns acceptance/rejection of the null hypothesis.


      Quiet demo here.

  • DepT-test

    • Calculates Two tailed T-test
    • Returns column behavior graph
    • Returns differences of means graph
    • Calculates t-statistic
    • Returns Cohen's D
    • Returns acceptance/rejection of the null hypothesis
    • Returns confidence interval


      Quiet demo here.

  • ConverS.py

    • Value replacement (I used it to convert string characters into integer values)
      • Example: I converted keys like "K023", which referred to buying solar panels or having an alternative electricity source ("Compra e instalación de paneles solares y planta de luz propia") into a value that contributes to the overall score variable I created.
    • Note: You need to modify this code in order to convert your own data
  • RangeR.py

    • Values assignment to Intervals
    • Returns minimum and maximum
    • Returns factors for that range
    • Returns new file with data split by intervals
    • Returns frequency and cumulative frecuency for values in those intervals.



  • SkewU.py

    • Skewness calculation
    • Returns skewness value and skewness graph



  • Boxy.py

    • BoxCox transformation to reduce skewness
    • Returns a set of histograms to compare:
      • Original data histogram
      • Un-skewed data using 'sqrt' histogram
      • Un-skewed data using 'BoxCox' histogram
    • Returns file with new data (using BoxCox or sqrt, optional)



  • Stan.py

    • Performs standardization of data
    • Returns comparison graphs
    • Returns new file with standardized data



  • PeaR.py (New error dealing with zeros)

    • Returns Pearson correlation coefficient
    • Returns p-value
    • Returns graph of correlation relationship


  • SpeaR.py (New error dealing with zeros)

    • Returns Spearman correlation coefficient
    • Returns p-value
    • Returns graph of correlation relationship



  • Anowoa.py (New pandas index error)

    • Performs Analysis of Variance (ANOVA), one or two ways (optional)
    • Returns Analysis of Variance between two or more group means
    • Returns Degrees of Freedom, Sum of Squares, Mean Square
    • Returns F-value and p-value
    • Returns Eta squared and Omega squared for effect size
    • Returns ANOVA table and variables scatter graph






      More data visualization coming soon...


How to Python:

Downloads here!

- Macintosh.
- Unix.
- Windows:
~ Tutorial for Windows installation.
~ Easy Way to run Python Programs on Windows.



l'astra lab icon



You can’t perform that action at this time.