<a href="https://colab.research.google.com/github/shirleyrutgers/DataVis/blob/main/ps0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analysis of Entrepreneurship in New Jersey, USA

## Research Question:
How does the age of business influence job creation in New Jersey, USA?

## Hypothesis:
Younger companies (0-1 years) generate more jobs compared to older companies (11+ years).

## Variables:
Independent Variable: Age of the company (young vs. established).
Dependent variable: Proportion of jobs generated in the private sector.

## Analysis:
*   Calculation of the proportion of jobs created by young and established companies in each sector, in New Jersey, USA.
*   Descriptive statistics and correlation graphic analysis to evaluate the relationship between the average age of the company and the number of jobs generated.

## Description
The Entrepreneurial Jobs series provided by Kauffman Foundation (2025),are Indicators of Entrepreneurship that measures entrepreneurial trends in the United States.

To analyze the relationship between the age of companies and the proportion of jobs they generate, I use the variables found in the file:

## Variables from the Dataset

* fips: State FIPS code (34 for New Jersey).
* name: Name of the state (New Jersey).
* geo_level: Geographic level (S for state).
* year: Year of the data.
* demographic-type: Type of demographic classification (here "Age of Business", which indicates the age of the business).
* demographic-code: Numerical code for the age category of the business.
demographic: Age category of the business (Example: "Ages 0 to 1", "Ages 2 to 3", etc.).
* contribution: Contribution of this group to entrepreneurial jobs.
* compensation: Level of compensation (wages).
* constancy: Level of stability or retention of employment.
* creation: Job creation (change in jobs generated).

## Sources
Ewing Marion Kauffman Foundation. Kauffman Indicators of Entrepreneurship (2025)
https://indicators.kauffman.org/data-downloads


In [3]:
#---------------------------SETUP----------------------------------
#get useful libraries
import time, os, sys, re #basics
import zipfile, json, datetime, string   #string for annotating points in scatter
import numpy as np #basic math
from statistics import * #stats

import matplotlib.pyplot as plt #import pylab as plt #apparently discouraged now:
 #https://stackoverflow.com/questions/11469336/what-is-the-difference-between-pylab-and-pyplot
 #https://www.tutorialspoint.com/matplotlib/matplotlib_pylab_module.htm

import pandas as pd
import pandas_datareader as pdr
from pandas_datareader import wb
from pandas.io.formats.style import Styler
#s4 = Styler(df4, uuid_len=0, cell_ids=False)

import urllib  #weird, guess need to have os and pandas imported for this to work  %TODO/LATER ditch it, its weird anyway, just use wget/curl

from google.colab import files

#import webbrowser

import seaborn as sns

from google.colab import data_table
data_table.enable_dataframe_formatter() #this enables spreadsheet view upon calling dataframe (without() )

#tricks how to extend notebook functionality
#https://coderzcolumn.com/tutorials/python/list-of-useful-magic-commands-in-jupyter-notebook-lab
#will display all output not just last command
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

###magics: https://ipython.readthedocs.io/en/stable/interactive/magics.html
#most essential setup for vis: it does affect vis! careful!! stick with inline, maybe notebook; others mostly for non-notebook, eg spyder environ
#https://jakevdp.github.io/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html recomends *inline*!
#show current one:
#%matplotlib
#%matplotlib --list
#interactive plots:
#%matplotlib notebook
#static images of your plot:
%matplotlib inline
#this one and other magics (btw default is probably agg)
#%matplotlib nbagg
##https://www.marktechpost.com/2023/10/20/6-magic-commands-for-jupyter-notebooks-in-python-data-science/
#%%latex
#%ai
#%run
#%writefile
#%history -n

###themes/styles: https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html
#https://jakevdp.github.io/PythonDataScienceHandbook/04.11-settings-and-stylesheets.html
#https://matplotlib.org/stable/tutorials/introductory/customizing.html
#here more about art and style than under the hood functionality as with magics, explore and experiment
#many may find 'default' or seaborn ones more pleasing; my fav 'classic' is back from 90s ;)
#plt.style.available #list available styles :) may install more
#plt.style.use('default') # more delicate subtle than classic
plt.style.use('classic')  #  'seaborn-whitegrid' 'seaborn-white' 'seaborn-poster'
# btw: magics v theme/style sequence matters, eg if i specify classic style before inline magic, i wouldnt get grey bounding box im getting

#sometimes have to install library which you get from https://pypi.org/
#!pip install geopandas


# Step 1. Download file from online and save in colab
> Data downloaded in cvs and previously cleaned, filtered only data from New Jersey. Saved to Google Drive. Database from https://indicators.kauffman.org/


Sources:
* Youtube tutorial. "Como importar CSV a Google Colab", (2025). https://www.youtube.com/watch?v=KCbpPhr_7DY

* Youtube tutorial. "Dos alternativas para cargar un archivo CSV a nuestro colab", (2025). https://www.youtube.com/watch?v=pLAoxHXOeuk


In [12]:
#link from my drive, published: https://docs.google.com/spreadsheets/d/e/2PACX-1vRLyyJl7NSl3lyqR3tcRKaUljUukA0XpwsQXe3tTROPRx2zamZtqSrASf10eDQLPecEfyz-v9Dw4ItM/pub?gid=805463561&single=true&output=csv
import pandas as pd
eji_nj2001_2020 = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vRLyyJl7NSl3lyqR3tcRKaUljUukA0XpwsQXe3tTROPRx2zamZtqSrASf10eDQLPecEfyz-v9Dw4ItM/pub?gid=805463561&single=true&output=csv')


Read the fisrt row and head of the dataframe, showing 1 to 5 entries.

In [15]:
eji_nj2001_2020.head()

Unnamed: 0,fips,name,geo_level,year,demographic-type,demographic-code,demographic,contribution,compensation,constancy,creation
0,34,New Jersey,S,2001,Age of Business,1,Ages 0 to 1,0.037775,0.989571,0.535145,4.663433
1,34,New Jersey,S,2001,Age of Business,2,Ages 2 to 3,0.045582,1.008414,0.631253,-0.252688
2,34,New Jersey,S,2001,Age of Business,3,Ages 4 to 5,0.048891,1.085714,0.667015,-0.543881
3,34,New Jersey,S,2001,Age of Business,4,Ages 6 to 10,0.112037,1.112621,0.652145,-2.259242
4,34,New Jersey,S,2001,Age of Business,5,Ages 11+,0.755715,1.271253,0.743614,-6.306614
