# Practice 2-2: Structured Data in Python

## 📚  <b> Practice 1. </b> 

Dictionaries are an extremely common data structure. Often data is easiest to store according to keys, where each row of data has a unique key. 

For example, a dictionary of students might have a key for each student's name or id, and then a set of values that are associated with that student.

### Make a dictionary of your peers.

Come up with 3-4 questions and ask 5-6 of your peers to give answers to each one. The questions can be anything you want, but they should be questions that can be answered with a single word or number.

For example, you might ask:
- What is your favorite color? (pretty weak question)
- How many siblings do you have? (ok, but not great)
- What is your favorite genre of film? (better)
- On a scale of 1-10, how well do you feel that you understand python dictionaries? (better still)

Make sure you have a key for each person, and then a set of values for each question. 

Based on each person's responses, build a dictionary in the cell below:

(note: You will end up with a dictionary of dictionaries)

```python
# Example:
my_dict = {
    'person1': {
        'question1': 'answer1',
        'question2': 'answer2',
        'question3': 'answer3',
        'question4': 'answer4',
    },
    'person2': {
        'question1': 'answer1',
        'question2': 'answer2',
        'question3': 'answer3',
        'question4': 'answer4',
    },
    'person3': {
        'question1': 'answer1',
        'question2': 'answer2',
        'question3': 'answer3',
        'question4': 'answer4',
    },
}
```

In [15]:
my_dict = {
    'Oksana': {
        'what is your favorite cusine': 'fusion',
        'cats or dogs or rats': 'dogs',
        'favorite dessert': 'strudent',
        'least favorite food': 'steak',
    },
    'Rosemary': {
        'what is your favorite cusine': 'Mexican',
        'cats or dogs or rats': 'rats',
        'favorite dessert': 'cheesecake',
        'least favorite food': 'blue cheese',
    },
    'Miriam': {
        'what is your favorite cusine': 'Mexican',
        'cats or dogs or rats': 'cats',
        'favorite dessert': 'cream puffs',
        'least favorite food': 'unseasoned',
    },
    'Hazel': {
        'what is your favorite cusine': 'Mexican',
        'cats or dogs or rats': 'cats',
        'favorite dessert': 'strawberry shortcake',
        'least favorite food': 'eggs',
    }
}

## Dictionary to dataframe

Now that you have a dictionary, you can convert it to a dataframe.

### Convert your dictionary to a dataframe

Use the `pd.DataFrame()` function to convert your dictionary to a dataframe.

```python
import pandas as pd

# Convert a dictionary to a dataframe
df = pd.DataFrame(my_dict)
```


In [16]:
import pandas as pd
df = pd.DataFrame(my_dict)



### Investigate your dataframe using the following functions:

- `df.head()`
- `df.tail()`
- `df.info()`



In [19]:

df.info()


<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, what is your favorite cusine to least favorite food
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Oksana    4 non-null      object
 1   Rosemary  4 non-null      object
 2   Miriam    4 non-null      object
 3   Hazel     4 non-null      object
dtypes: object(4)
memory usage: 160.0+ bytes


Is your dataframe what you expected? If not, what is different? Why?? How would you need to change your data structure to get the dataframe you expected?

In [None]:
# it is what i expected



`df.transpose()` might be helpful here. What does it do?
create a new dataframe that is a transposed version of your original dataframe.

```python
# Transpose a dataframe
df_transposed = df.transpose()
```

In [20]:
# Transpose your dataframe so that the people are the rows and the questions are the columns.

df.transpose

<bound method DataFrame.transpose of                                 Oksana     Rosemary       Miriam  \
what is your favorite cusine    fusion      Mexican      Mexican   
cats or dogs or rats              dogs         rats         cats   
favorite dessert              strudent   cheesecake  cream puffs   
least favorite food              steak  blue cheese   unseasoned   

                                             Hazel  
what is your favorite cusine               Mexican  
cats or dogs or rats                          cats  
favorite dessert              strawberry shortcake  
least favorite food                           eggs  >

### Visualize/summarize your dataframe using one or more of the following functions:

- `df.describe()`
- `df.plot()`
- `df.hist()`

ValueError: hist method requires numerical or datetime columns, nothing to plot.

## 📚  <b> Practice 2. </b>

### Structured data search: Find structured data on the internet and convert it to a dataframe.

You're looking for data that is in a table format, like a spreadsheet, but not available as an easy to download `.csv` or `.excel` file. This turns out to describe a lot of data!

Often you can find this kind of data on wikipedia, or on government websites, or in research articles that contain tables of results.


In [30]:
import requests 

url = "https://static.anaconda.cloud/content/Anaconda_2022_State_of_Data_Science_+Raw_Data.csv"

# Try to access the file using the requests library
response = requests.get(url)
response.raise_for_status()

df1 = pd.DataFrame(response)

### Convert your data to a dataframe

In [31]:
from lxml import html
from 

url = "https://en.wikipedia.org/wiki/Sprite_(computer_graphics)"
response = requests.get(url)

html_content = response.content

money = html.fromstring(html_content)

table_element = money.xpath('//table')[0]
rows = table_element.xpath('.//tr')

data = []

for row in rows:
    cols = row.xpath('.//td')
    cols = [col.text.strip() for col in cols]
    data.append(cols)



AttributeError: 'NoneType' object has no attribute 'strip'

In [None]:
import requests

url = 'https://www.example.com/table.html'
response = requests.get(url)

html_content = response.content

from lxml import html

tree = html.fromstring(html_content)

table_element = tree.xpath('//table')[0]
rows = table_element.xpath('.//tr')

data = []
for row in rows:
    cols = row.xpath('.//td')
    cols = [col.text.strip() for col in cols]
    data.append(cols)

In [34]:
import pandas as pd # library for data analysis
import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML documents

# get the response in the form of html
wikiurl="https://en.wikipedia.org/wiki/List_of_cities_in_India_by_population"
table_class="wikitable sortable jquery-tablesorter"
response=requests.get(wikiurl)
print(response.status_code)

# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(response.text, 'html.parser')
indiatable=soup.find('table',{'class':"wikitable"})
df=pd.read_html(str(indiatable))
# convert list to dataframe
df=pd.DataFrame(df[0])
print(df.head())

200
   Rank       City  Population (2011)[3]  Population (2001)[3][a]  \
0     1     Mumbai              12442373                 11978450   
1     2      Delhi              11034555                  9879172   
2     3  Bangalore               8443675                  5682293   
3     4  Hyderabad               6993262                  5496960   
4     5  Ahmedabad               5577940                  4470006   

  State or union territory  Ref  
0              Maharashtra  [3]  
1                    Delhi  [3]  
2                Karnataka  [3]  
3                Telangana  [3]  
4                  Gujarat  [3]  


## Explore your dataframe using the following functions:

- `df.describe()`
- `df.plot()`
- `df.hist()`