# Generating a DataSet of Black Analytics Members
Jupyter Notebooks gives you the flexibility to both document and execute your code in the same location.  Jupyter notebooks are also language-agnostic so you can essentially run code in any lanuguage here as long as you have enabled the necessary kernel to execute the code.  The name Jupyter is a concatenation of the languages **Ju**lia, **Pyt**hon and **R**.  I have worked with 2 of the 3 languages so far (Python, R).  These languages and Juptyer notebooks are extremely useful when performing Data Science tasks.  We will go through the process of generating a dataset of All Black Analytics Members within this notebook and outputting that result into a *csv* file format for later consumpution by any other data-processing tool.

Let us first start off by understanding what type of information would we like to maintain within this dataset.  Some fields that I thought would be useful to maintain are the following:

- First Name
- Middle Initial
- Last Name
- Suffix Present
- Suffix Value
- City of Origin
- State of Origin
- Zipcode of Origin
- Country of Origin
- Profession


As you can see the list of fields that you can the number of fields and information that you can accumulate on an individual can grow very fast!!  You may also notice that a number of fields are separated just to describe a single entity.  For example, we have **5** fields just describe a person's name. This level of separation makes it very easy to analyze the entries individually as we will see later in this tutorial.  Let us get started with some code.  This notebook is intended to be used with **Python 3**.  This will not work properly in **Python 2**.

*Python* is a general purpose programming language that is commonly used in Data Science.  Its ease of use makes it a very approachable language to learn if you are not familar with programming.  We will be using a very popular module in Data Science called **Pandas**.  Before we use the pandas module we must import it for use

In [1]:
import pandas as pd

The *as pd* allows us to just say "pd" in the code instead of always typing out pandas anytime we want to call a function from it.  The important object that is a part of the pandas module is called **DataFrame**.  This is a very useful object for storing data and renders very well within Jupyter notebooks.  Let us create our DataFrame object

In [23]:
columns = ['First Name', 'Middle Initial', "Last Name", "Suffix Present", "Suffix Name",
           'City of Origin', "State of Origin","Zipcode of Origin", "Country of Origin","Profession"]


d = {'First Name' : ['Ransford'], 'Middle Initial': ['M'], "Last Name": ["Hyman"], "Suffix Present": ["Yes"], "Suffix Name":["Jr."],
    'City of Origin': ["Atlanta"], "State of Origin":["Georgia"], "Zipcode of Origin": [30331], "Country of Origin":["United States"],
    "Profession": ["Software Engineer"]}
members = pd.DataFrame(d)

Let us see what this data frame looks like:

In [3]:
members

Unnamed: 0,City of Origin,Country of Origin,First Name,Last Name,Middle Initial,Profession,State of Origin,Suffix Name,Suffix Present,Zipcode of Origin
0,Atlanta,United States,Ransford,Hyman,M,Software Engineer,Georgia,Jr.,Yes,30331


We will add data to our variable **d** and then re-create the members DataFrame.  Typically you would fully populate the dictionary **d** and then create a DataFrame from that object.  The way that we are updating the DataFrame is just for this example.

In [24]:
d['First Name'].append('Arthur')
d['Middle Initial'].append('L')
d['Last Name'].append('Talley')
d['Suffix Present'].append('No')
d['Suffix Name'].append("N/A")
d['City of Origin'].append("Detroit")
d['State of Origin'].append('Michigan')
d['Zipcode of Origin'].append(42805)
d['Country of Origin'].append("United States")
d['Profession'].append("Sr. Finance Analyst")

In [25]:
members = pd.DataFrame(d)

members

Unnamed: 0,City of Origin,Country of Origin,First Name,Last Name,Middle Initial,Profession,State of Origin,Suffix Name,Suffix Present,Zipcode of Origin
0,Atlanta,United States,Ransford,Hyman,M,Software Engineer,Georgia,Jr.,Yes,30331
1,Detroit,United States,Arthur,Talley,L,Sr. Finance Analyst,Michigan,,No,42805


Now it is your turn!! Enter your information in the cell below just like I have done in the example above:

In [26]:
#Enter your data here below this line
d['First Name'].append('Jason')
d['Middle Initial'].append('T')
d['Last Name'].append ('Fleming')
d['Profession'].append ('Senior Business Analyst')
d['State of Origin'].append ('Kentucky')
d['Suffix Name'].append ('N/A')
d['Suffix Present'].append ('No')
d['Zipcode of Origin'].append ('40014')
d['City of Origin'].append ('Crestwood')
d['Country of Origin'].append ('United States')



In [17]:
d

{'City of Origin': ['Atlanta', 'Detroit', 'Crestwood'],
 'Country of Origin': ['United States', 'United States', 'United States'],
 'First Name': ['Ransford', 'Arthur', 'Jason'],
 'Last Name': ['Hyman', 'Talley', 'Fleming'],
 'Middle Initial': ['M', 'L', 'T'],
 'Profession': ['Software Engineer',
  'Sr. Finance Analyst',
  'Senior Business Analyst'],
 'State of Origin': ['Georgia', 'Michigan', 'Kentucky'],
 'Suffix Name': ['Jr.', 'N/A', 'N/A'],
 'Suffix Present': ['Yes', 'No', 'No'],
 'Zipcode of Origin': [30331, 42805, '40014']}

Now let us see if the information that you have entered generated the correct DataFrame.  Run the cell below by clicking **Shift + Enter** on the cell

In [18]:
members = pd.DataFrame(d)
members

Unnamed: 0,City of Origin,Country of Origin,First Name,Last Name,Middle Initial,Profession,State of Origin,Suffix Name,Suffix Present,Zipcode of Origin
0,Atlanta,United States,Ransford,Hyman,M,Software Engineer,Georgia,Jr.,Yes,30331
1,Detroit,United States,Arthur,Talley,L,Sr. Finance Analyst,Michigan,,No,42805
2,Crestwood,United States,Jason,Fleming,T,Senior Business Analyst,Kentucky,,No,40014


**Excellent!!!** Now that you have made it this far, Enter the other members' information

In [27]:
#Enter Kinsgley A information here
d['First Name'].append('Kingsley')
d['Middle Initial'].append('D')
d['Last Name'].append('Adeoye')
d['Suffix Present'].append('No')
d['Suffix Name'].append('N/A')
d['City of Origin'].append('Sacramento')
d['State of Origin'].append('California')
d['Zipcode of Origin'].append('95670')
d['Country of Origin'].append('United States')
d['Profession'].append('Firmware Engineer')

In [28]:
#Enter Mike R information here
d['First Name'].append('Michael')
d['Middle Initial'].append('F ing')
d['Last Name'].append('Reynolds')
d['Suffix Present'].append('Yes')
d['Suffix Name'].append('Sr.')
d['City of Origin'].append('Philly')
d['State of Origin'].append('Pennsylvania')
d['Zipcode of Origin'].append('19131')
d['Country of Origin'].append('United States')
d['Profession'].append('Entrepreneur/Software Engineer')

In [29]:
#Enter Darrell's information here
d['First Name'].append('Darrell')
d['Middle Initial'].append('W')
d['Last Name'].append('Johnson')
d['Suffix Present'].append('Yes')
d['Suffix Name'].append('Jr.')
d['City of Origin'].append('El Dorado Hills')
d['State of Origin'].append('California')
d['Zipcode of Origin'].append('95762')
d['Country of Origin'].append('United States')
d['Profession'].append('IT Manager')

In [None]:
#Enter Vick's information here

Now that we have all the information that we will need for our dataset, let us create our final DataFrame.  Enter the code below that will create the DataFrame in the *members* variable and print the DataFrame

In [30]:
members = pd.DataFrame(d)

members

Unnamed: 0,City of Origin,Country of Origin,First Name,Last Name,Middle Initial,Profession,State of Origin,Suffix Name,Suffix Present,Zipcode of Origin
0,Atlanta,United States,Ransford,Hyman,M,Software Engineer,Georgia,Jr.,Yes,30331
1,Detroit,United States,Arthur,Talley,L,Sr. Finance Analyst,Michigan,,No,42805
2,Crestwood,United States,Jason,Fleming,T,Senior Business Analyst,Kentucky,,No,40014
3,Sacramento,United States,Kingsley,Adeoye,D,Firmware Engineer,California,,No,95670
4,Philly,United States,Michael,Reynolds,F ing,Entrepreneur/Software Engineer,Pennsylvania,Sr.,Yes,19131
5,El Dorado Hills,United States,Darrell,Johnson,W,IT Manager,California,Jr.,Yes,95762


Now that we have created the Dataframe, we can now create the necessary files to place it in the desired file format.  Below I will create a *csv* file for our dataFrame.

In [31]:
members.to_csv("botf.csv")

And Since your name is Jason and Brother Kings is so gung-ho on the JSON format, lets create a JSON file for him to play around with

In [32]:
members.to_json("botf.json")

***And there you have it!!! Congratulations on creating your first DataSet in Python!! Job Well Done!!! You will be a Data Scientist in no time!!***