# Getting Data - II

In this section, you will learn to:
- Get data from APIs
    - Use the ```requests``` module to connect to a URL and fetch a response
    - Use ```json.loads()``` to convert a JSON object to a python dictionary
- Read PDF files in python using ```PyPDF2```


### Getting Data from APIs

APIs, or application programming interfaces, are created by companies and organisations to provide restricted access to data. It is very common to get data from APIs for data analysis, for example, you can get financial data (stock prices etc.), social media data (Facebook, Twitter etc. provide APIs), weather data, data about healthcare, music, food and drinks, and from almost every domain. 


Apart from being rich sources of data, there are other reasons to use APIs:
- When the data is being updated in real time. If you use downloaded CSV files, you'll have to download data manually, and update your analysis multiple times. Through APIs, you can automate the process of getting real-time data.
- Easy access to structured and verified data - though you can scrape websites, APIs can directly provide data in structured format, and is of better quality
- Access to restricted data: You cannot scrape all websites easily, and that's often illegal (e.g. Facebook, financial data etc.). APIs are the only way to get this data.

There are many more reasons depending on the use cases and the domain of application.

A list of useful APIs is available here: https://github.com/toddmotto/public-apis

#### Example Use Case: Google Maps Geocoding API

Google Maps provides many APIs, one of which is the <a href="https://developers.google.com/maps/documentation/geocoding/start?authuser=1">Google Maps Geocoding API</a>. You can use it to geocode addresses, i.e. get the latitude-longitude coordinates, and vice-versa. 
    
To use the API, go to <a href="https://developers.google.com/maps/">Google Developers</a>, get an API key, and go to the Geocoding API page.


Once you have an API key, getting the geocoded data of an address is easy. For e.g., if you want to geocode the address "UpGrad, Nishuvi building, Anne Besant Road, Worli, Mumbai", you need to separate the words using a "+", and provide the address and your API key in this format:

https://maps.googleapis.com/maps/api/geocode/json?address=UpGrad,+Nishuvi+building,+Anne+Besant+Road,+Worli,+Mumbai&key=YOUR_API_KEY


Thus, this is a two step process:
- Join the words in the address by a plus and convert it to a form ```words+in+the+address``` 
- Connect to the URL by appending the address and the API key
- Get a response from the API and convert it to a python object (here, a dictionary)


In [1]:
import numpy as np
import pandas as pd

# Need requests to connect to the URL, json to convert JSON to dict
import requests, json
import pprint

# joining words in the address by a "+"
add = "UpGrad, Nishuvi building, Anne Besant Road, Worli, Mumbai"
split_address = add.split(" ")
address = "+".join(split_address)
print(address)



UpGrad,+Nishuvi+building,+Anne+Besant+Road,+Worli,+Mumbai


In [None]:
data = json.loads()

Now, we can connect to the Google Maps URL using the api key and the address and get a response. Like most APIs, Google Maps returns the geocoded data in a JSON format, which is similar to a python dict.

As seen in the earlier section, we use the ```requests.get(url)``` method to get data from a URL. 

In [2]:
api_key = "AIzaSyBXrK8md7uaOcpRpaluEGZAtdXS4pcI5xo"

url = "https://maps.googleapis.com/maps/api/geocode/json?address={0}&key={1}".format(address, api_key)
r = requests.get(url)

# The r.text attribute contains the text in the response object
print(type(r.text))
print(r.text)

<class 'str'>
{
   "error_message" : "You must enable Billing on the Google Cloud Project at https://console.cloud.google.com/project/_/billing/enable Learn more at https://developers.google.com/maps/gmp-get-started",
   "results" : [],
   "status" : "REQUEST_DENIED"
}



The dict-like structure that you see above is a JSON object, and is the most common way of exchanging data through APIs. We can easily convert the JSON object to a python dict using ```json.loads(json_object)```.

Notice that the JSON object contains various details of the address - the components of the address, the full address, the latitude and the longitude, PIN code, etc. 

Let's convert the JSON to a dictionary, so that we can work with it easily.

In [3]:
# converting the json object to a dict using json.loads()
r_dict = json.loads(r.text)

# the pretty printing library pprint makes it easy to read large dictionaries
pprint.pprint(r_dict)

{'error_message': 'You must enable Billing on the Google Cloud Project at '
                  'https://console.cloud.google.com/project/_/billing/enable '
                  'Learn more at '
                  'https://developers.google.com/maps/gmp-get-started',
 'results': [],
 'status': 'REQUEST_DENIED'}


In [3]:
# The dict has two main keys - status and results
r_dict.keys()

NameError: name 'r_dict' is not defined

The ```r_dict['results']``` contains a list of various attributes.

In [4]:
pprint.pprint(r_dict['results'])

NameError: name 'r_dict' is not defined

On closer inspection, you'll see that the latitude is contained in ```r_dict['results'][0]['geometry']['location']['lat']``` and the longitude in ```r_dict['results'][0]['geometry']['location']['lng']```.

In [5]:
lat = r_dict['results'][0]['geometry']['location']['lat']
lng = r_dict['results'][0]['geometry']['location']['lng']

print((lat, lng))

NameError: name 'r_dict' is not defined

To summarise, the procedure for getting lat-long coordinates from an address is as follows:
- Convert the address to a suitable format and connect to the Google Maps URL using your key
- Get a response from the API and convert it into a dict using ```json.loads(r.text)```
- Get the lat-long corrdinates using ```lat = r_dict['results'][0]['geometry']['location']['lat']``` and analogous for longitude

**Writing a Function for this Procedure**

Since you may need to do this multiple times, let's write a function which takes in a user-defined address, converts it into a suitable format, and returns the lat-long coordinates as a tuple.



In [6]:
# Input to the fn: Address in standard human-readable form
# Output: Tuple (lat, lng)

api_key = "AIzaSyBXrK8md7uaOcpRpaluEGZAtdXS4pcI5xo"

def address_to_latlong(address):
    # convert address to the form x+y+z
    split_address = address.split(" ")
    address = "+".join(split_address)
    
    # pass the address to the URL
    url = "https://maps.googleapis.com/maps/api/geocode/json?address={0}&key={1}".format(address, api_key)
    
    # connect to the URL, get response and convert to dict
    r = requests.get(url)
    r_dict = json.loads(r.text)
    lat = r_dict['results'][0]['geometry']['location']['lat']
    lng = r_dict['results'][0]['geometry']['location']['lng']
    
    return (lat, lng)
    

# getting some coordinates
print(address_to_latlong("UpGrad, Nishuvi Building, Worli, Mumbai"))
print(address_to_latlong("IIIT Bangalore, Electronic City, Bangalore"))


SSLError: HTTPSConnectionPool(host='maps.googleapis.com', port=443): Max retries exceeded with url: /maps/api/geocode/json?address=UpGrad,+Nishuvi+Building,+Worli,+Mumbai&key=AIzaSyBXrK8md7uaOcpRpaluEGZAtdXS4pcI5xo (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

Now, what can be a practical use case of using a geocoding API in data analysis? 

Say you are working in an ecommerce retail company, and you have a dataframe containing a list of customer addresses. Your logistics team wants to identify clusters of customers staying close by, so that they can plan the deliveries accordingly.

We have taken some real addresses an examples below. They are stored in a dataframe, and you want to add a column containing the (lat, lng) of each address. 


In [8]:
# Importing addresses file
add = pd.read_csv("addresses.txt", sep="\t", header = None)
add.head()


Unnamed: 0,0
0,"777 Brockton Avenue, Abington MA 2351"
1,"30 Memorial Drive, Avon MA 2322"
2,"250 Hartford Avenue, Bellingham MA 2019"
3,"700 Oak Street, Brockton MA 2301"
4,"66-4 Parkhurst Rd, Chelmsford MA 1824"


In [9]:
# renaming the column
add = add.rename(columns={0:'address'})
add.head()

Unnamed: 0,address
0,"777 Brockton Avenue, Abington MA 2351"
1,"30 Memorial Drive, Avon MA 2322"
2,"250 Hartford Avenue, Bellingham MA 2019"
3,"700 Oak Street, Brockton MA 2301"
4,"66-4 Parkhurst Rd, Chelmsford MA 1824"


We can now apply the function ```address_to_latlong()``` to the entire column of the dataframe. Since the function takes a lot of time, we'll only apply the function to the first few rows.

In [11]:
add.head()['address'].apply(address_to_latlong)

IndexError: list index out of range

You now have the coordinates of all the addresses which you can store in a new column, and write programs to cluster addresses that are close by together.

### Reading PDF Files in Python

Reading PDF files is not as straightforward as reading text or delimited files, since PDFs often contain images, tables, etc. PDFs are mainly designed to be human-readable, and thus you need special libraries to read them in python (or any other programming language).

Luckily, there are some really good libraries in Python. We will use ```PyPDF2``` to read PDFs in python, since it is easy to use and works with *most* types of PDFs. 

Note that python will only be able to read text from PDFs, not images, tables etc. (though that is possible using other specialised libraries).

You can install ```PyPDF2``` using ```pip install PyPDF2```.


For this illustration, we will read a PDF of the book 'Animal Farm' written by George Orwell. 


In [13]:
pip install PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
     ------------------------------------ 232.6/232.6 kB 330.9 kB/s eta 0:00:00
Installing collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1
Note: you may need to restart the kernel to use updated packages.


In [12]:
import PyPDF2

# reading the pdf filepdf', 'rb')
pdf_reader = PyPDF2.PdfReader(pdf_object)

# Number of pages in the PDF file
#print(len(pdf_reader))

# get a certain page's text
page_object = pdf_reader.pages[10]

# Extract text from the page_object
print(page_object.extract_text())

NameError: name 'pdf_object' is not defined

In [13]:
import numpy as np
import pandas as pd

In [14]:
[num**2 for num in np.arange(3, 20, 2)]

[9, 25, 49, 81, 121, 169, 225, 289, 361]

In [15]:
import requests, bs4, json

In [16]:

api_key="579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b"
_format = 'json'
_url="https://data.gov.in/apis/3b5918c2-ddef-4535-bdb5-6b426024873d/resource/3b5918c2-ddef-4535-bdb5-6b426024873d"
_params = {'api-key':api_key, 'format':_format }
res=requests.get(url=_url, params =_params)
print(bs4.BeautifulSoup(res.text, 'html.parser').prettify())



<!DOCTYPE html>
<html lang="en">
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <link href="https://data.gov.in/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"/>
  <meta content="width=device-width,initial-scale=1,maximum-scale=1" name="viewport"/>
  <meta content="E_mHMxl8OOc7JXOce6JaGYnsHTczS2spGA35pac5-m0" name="google-site-verification"/>
  <meta content="Open Government Data Platform (OGD) India is a single-point of access to Resources in an open format published by Ministries/Departments/Organizations of GoI. Get details of Open Data Events, Visualizations, Blogs, and Infographics." name="description"/>
  <meta content="Open Government Data Platform (OGD) India is a single-point of access to Resources in an open format published by Ministries/Departments/Organizations of GoI. Get details of Open Data Events, Visualizations, Blogs, and Infographics." name="abstract"/>
  <meta content="Open Government Data Platform (OGD) India is a

'''
Hello from the Guardian.

Thank you for registering with the open platform.

A new key has been created for you: 5b05fe07-dbf6-42b9-8dc3-bc33264fc871

You can try this key by accessing https://content.guardianapis.com/search?api-key=5b05fe07-dbf6-42b9-8dc3-bc33264fc871 in your browser.

For more details on how to use the open platform API, check out the documentation available at http://open-platform.theguardian.com/documentation/
'''

In [18]:
theguardian_api="https://content.guardianapis.com/search?api-key=5b05fe07-dbf6-42b9-8dc3-bc33264fc871"
res=requests.get(url=theguardian_api)
py_dic = json.loads(res.text)
py_dic['response']


{'status': 'ok',
 'userTier': 'developer',
 'total': 2413601,
 'startIndex': 1,
 'pageSize': 10,
 'currentPage': 1,
 'pages': 241361,
 'orderBy': 'newest',
 'results': [{'id': 'politics/live/2023/mar/23/boris-johnson-rishi-sunak-partygate-brexit-latest-politics-news-updates',
   'type': 'liveblog',
   'sectionId': 'politics',
   'sectionName': 'Politics',
   'webPublicationDate': '2023-03-23T14:27:36Z',
   'webTitle': 'Nicola Sturgeon defends record and offers advice to MSPs at final first minister’s questions – UK politics live',
   'webUrl': 'https://www.theguardian.com/politics/live/2023/mar/23/boris-johnson-rishi-sunak-partygate-brexit-latest-politics-news-updates',
   'apiUrl': 'https://content.guardianapis.com/politics/live/2023/mar/23/boris-johnson-rishi-sunak-partygate-brexit-latest-politics-news-updates',
   'isHosted': False,
   'pillarId': 'pillar/news',
   'pillarName': 'News'},
  {'id': 'business/live/2023/mar/23/bank-of-england-interest-rates-inflation-fed-hike-business-l

In [21]:
type(py_dic['response'])

dict

In [19]:
py_dic['response']['results']

[{'id': 'politics/live/2023/mar/23/boris-johnson-rishi-sunak-partygate-brexit-latest-politics-news-updates',
  'type': 'liveblog',
  'sectionId': 'politics',
  'sectionName': 'Politics',
  'webPublicationDate': '2023-03-23T14:27:36Z',
  'webTitle': 'Nicola Sturgeon defends record and offers advice to MSPs at final first minister’s questions – UK politics live',
  'webUrl': 'https://www.theguardian.com/politics/live/2023/mar/23/boris-johnson-rishi-sunak-partygate-brexit-latest-politics-news-updates',
  'apiUrl': 'https://content.guardianapis.com/politics/live/2023/mar/23/boris-johnson-rishi-sunak-partygate-brexit-latest-politics-news-updates',
  'isHosted': False,
  'pillarId': 'pillar/news',
  'pillarName': 'News'},
 {'id': 'business/live/2023/mar/23/bank-of-england-interest-rates-inflation-fed-hike-business-live',
  'type': 'liveblog',
  'sectionId': 'business',
  'sectionName': 'Business',
  'webPublicationDate': '2023-03-23T14:26:25Z',
  'webTitle': 'UK interest rates raised to 4.25

In [22]:
df= pd.DataFrame(py_dic['response']['results'])
df

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,isHosted,pillarId,pillarName
0,politics/live/2023/mar/23/boris-johnson-rishi-...,liveblog,politics,Politics,2023-03-23T14:27:36Z,Nicola Sturgeon defends record and offers advi...,https://www.theguardian.com/politics/live/2023...,https://content.guardianapis.com/politics/live...,False,pillar/news,News
1,business/live/2023/mar/23/bank-of-england-inte...,liveblog,business,Business,2023-03-23T14:26:25Z,UK interest rates raised to 4.25% by Bank of E...,https://www.theguardian.com/business/live/2023...,https://content.guardianapis.com/business/live...,False,pillar/news,News
2,technology/2023/mar/23/tiktok-harvard-graduate...,article,technology,Technology,2023-03-23T14:26:17Z,TikTok’s Harvard graduate CEO Shou Zi Chew bat...,https://www.theguardian.com/technology/2023/ma...,https://content.guardianapis.com/technology/20...,False,pillar/news,News
3,commentisfree/2023/mar/23/boris-johnson-circus...,article,commentisfree,Opinion,2023-03-23T14:24:20Z,Is this the last hurrah for the Boris Johnson ...,https://www.theguardian.com/commentisfree/2023...,https://content.guardianapis.com/commentisfree...,False,pillar/opinion,Opinion
4,politics/2023/mar/23/rishi-sunak-saved-tax-cap...,article,politics,Politics,2023-03-23T14:17:47Z,"Rishi Sunak saved £300,000 in tax thanks to cu...",https://www.theguardian.com/politics/2023/mar/...,https://content.guardianapis.com/politics/2023...,False,pillar/news,News
5,world/2023/mar/23/pakistan-delays-punjab-elect...,article,world,World news,2023-03-23T14:14:21Z,Pakistan delays Punjab election despite suprem...,https://www.theguardian.com/world/2023/mar/23/...,https://content.guardianapis.com/world/2023/ma...,False,pillar/news,News
6,us-news/live/2023/mar/23/trump-indictment-hush...,liveblog,us-news,US news,2023-03-23T14:11:09Z,Further delay as Trump hush-money grand jury w...,https://www.theguardian.com/us-news/live/2023/...,https://content.guardianapis.com/us-news/live/...,False,pillar/news,News
7,world/live/2023/mar/23/russia-ukraine-war-live...,liveblog,world,World news,2023-03-23T14:08:20Z,Russia-Ukraine war live: Moscow says relations...,https://www.theguardian.com/world/live/2023/ma...,https://content.guardianapis.com/world/live/20...,False,pillar/news,News
8,australia-news/2023/mar/24/early-career-essent...,article,australia-news,Australia news,2023-03-23T14:00:56Z,Early career essential workers unable to affor...,https://www.theguardian.com/australia-news/202...,https://content.guardianapis.com/australia-new...,False,pillar/news,News
9,australia-news/2023/mar/24/ama-calls-for-gover...,article,australia-news,Australia news,2023-03-23T14:00:56Z,AMA calls for governments to implement royal c...,https://www.theguardian.com/australia-news/202...,https://content.guardianapis.com/australia-new...,False,pillar/news,News


In [23]:
df.isnull().sum()

id                    0
type                  0
sectionId             0
sectionName           0
webPublicationDate    0
webTitle              0
webUrl                0
apiUrl                0
isHosted              0
pillarId              0
pillarName            0
dtype: int64

In [24]:
df.isnull().sum(axis=1)

0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
9    0
dtype: int64

In [26]:
uk_cirme_apiUrl="https://data.police.uk/api/crimes-street/all-crime?lat=52.629729&lng=-1.131592"
res= requests.get(uk_cirme_apiUrl)
uk_cirme_data_dict = json.loads(res.text)
uk_crime_data = pd.DataFrame(uk_cirme_data_dict)
uk_crime_data

Unnamed: 0,category,location_type,location,context,outcome_status,persistent_id,id,location_subtype,month
0,anti-social-behaviour,Force,"{'latitude': '52.632985', 'street': {'id': 173...",,,,107770813,,2023-01
1,anti-social-behaviour,Force,"{'latitude': '52.635401', 'street': {'id': 173...",,,,107770503,,2023-01
2,anti-social-behaviour,Force,"{'latitude': '52.637791', 'street': {'id': 173...",,,,107770521,,2023-01
3,anti-social-behaviour,Force,"{'latitude': '52.636041', 'street': {'id': 173...",,,,107770524,,2023-01
4,anti-social-behaviour,Force,"{'latitude': '52.637257', 'street': {'id': 173...",,,,107770527,,2023-01
...,...,...,...,...,...,...,...,...,...
1433,other-crime,Force,"{'latitude': '52.635012', 'street': {'id': 173...",,"{'category': 'Under investigation', 'date': '2...",31fb2177d58b4e39ad61a4ca63623ad18400f44ba717b0...,107769378,,2023-01
1434,other-crime,Force,"{'latitude': '52.626202', 'street': {'id': 173...",,"{'category': 'Under investigation', 'date': '2...",4fa44bac0d1fc0c93cfbf6ce9118e28f408c411d7586f1...,107766321,,2023-01
1435,other-crime,Force,"{'latitude': '52.632387', 'street': {'id': 173...",,"{'category': 'Under investigation', 'date': '2...",b7e3162cea570a35d208baf3853b5b93e334081d139f9a...,107762814,,2023-01
1436,other-crime,Force,"{'latitude': '52.631302', 'street': {'id': 173...",,"{'category': 'Under investigation', 'date': '2...",158d9a0abc14c52192df8ba5697ec3b855262b165c44cd...,107766070,,2023-01


In [27]:
uk_crime_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1438 entries, 0 to 1437
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   category          1438 non-null   object
 1   location_type     1438 non-null   object
 2   location          1438 non-null   object
 3   context           1438 non-null   object
 4   outcome_status    1353 non-null   object
 5   persistent_id     1438 non-null   object
 6   id                1438 non-null   int64 
 7   location_subtype  1438 non-null   object
 8   month             1438 non-null   object
dtypes: int64(1), object(8)
memory usage: 101.2+ KB


In [28]:
uk_crime_data.describe()

Unnamed: 0,id
count,1438.0
mean,107766800.0
std,8622.222
min,107762200.0
25%,107764500.0
50%,107766600.0
75%,107768800.0
max,108079600.0


In [29]:
uk_crime_data.isnull().sum()

category             0
location_type        0
location             0
context              0
outcome_status      85
persistent_id        0
id                   0
location_subtype     0
month                0
dtype: int64

In [30]:
uk_crime_data.columns

Index(['category', 'location_type', 'location', 'context', 'outcome_status',
       'persistent_id', 'id', 'location_subtype', 'month'],
      dtype='object')

In [31]:
uk_crime_data['outcome_status']

0                                                    None
1                                                    None
2                                                    None
3                                                    None
4                                                    None
                              ...                        
1433    {'category': 'Under investigation', 'date': '2...
1434    {'category': 'Under investigation', 'date': '2...
1435    {'category': 'Under investigation', 'date': '2...
1436    {'category': 'Under investigation', 'date': '2...
1437    {'category': 'Under investigation', 'date': '2...
Name: outcome_status, Length: 1438, dtype: object

In [32]:
uk_crime_data['outcome_status'].value_counts().sort_values()

{'category': 'Action to be taken by another organisation', 'date': '2023-01'}               3
{'category': 'Offender given a caution', 'date': '2023-01'}                                 4
{'category': 'Further investigation is not in the public interest', 'date': '2023-01'}      5
{'category': 'Further action is not in the public interest', 'date': '2023-01'}             5
{'category': 'Formal action is not in the public interest', 'date': '2023-01'}             14
{'category': 'Local resolution', 'date': '2023-01'}                                        42
{'category': 'Awaiting court outcome', 'date': '2023-01'}                                  58
{'category': 'Unable to prosecute suspect', 'date': '2023-01'}                            245
{'category': 'Investigation complete; no suspect identified', 'date': '2023-01'}          410
{'category': 'Under investigation', 'date': '2023-01'}                                    567
Name: outcome_status, dtype: int64

In [33]:
path="./GOI_data/fci_Stock_Position_commodity_03_Rice-Parboiled_Bihar-2022.CSV"
fci_Stock_Pos_Parboiled_Rice_Bihar = pd.read_csv(path)
fci_Stock_Pos_Parboiled_Rice_Bihar

Unnamed: 0,Date,Code,CommodityId,CommodityName,DistrictName,DistrictCode,Stock,CommodityStock,TotalStock
0,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,BHAGALPUR,EC12,6054.03340,670773.28615,670773.28615
1,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,DARBHANGA,EC13,21699.65130,670773.28615,670773.28615
2,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,GAYA,EC14,106308.35690,670773.28615,670773.28615
3,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,MUZAFFARPUR,EC15,81102.67909,670773.28615,670773.28615
4,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,PURNIA,EC16,58705.13805,670773.28615,670773.28615
...,...,...,...,...,...,...,...,...,...
67,2022-01-07T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,DARBHANGA,EC13,6716.86310,163408.06314,163408.06314
68,2022-01-07T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,GAYA,EC14,84482.61668,163408.06314,163408.06314
69,2022-01-07T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,PURNIA,EC16,12005.83255,163408.06314,163408.06314
70,2022-01-07T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,MOTIHARI,EC23,1952.12400,163408.06314,163408.06314


In [34]:
fci_Stock_Pos_Parboiled_Rice_Bihar.describe()

Unnamed: 0,CommodityId,Stock,CommodityStock,TotalStock
count,72.0,72.0,72.0,72.0
mean,3.0,67654.509586,732851.823288,732851.823288
std,0.0,43066.452493,186362.218032,186362.218032
min,3.0,1952.124,163408.06314,163408.06314
25%,3.0,39862.87785,704628.31665,704628.31665
50%,3.0,65618.718825,801968.0819,801968.0819
75%,3.0,89355.140043,856895.33853,856895.33853
max,3.0,155305.9019,858007.01939,858007.01939


In [35]:
fci_Stock_Pos_Parboiled_Rice_Bihar.columns

Index(['Date', 'Code', 'CommodityId', 'CommodityName', 'DistrictName',
       'DistrictCode', 'Stock', 'CommodityStock', 'TotalStock'],
      dtype='object')

In [36]:
fci_Stock_Pos_Parboiled_Rice_Bihar.groupby('DistrictName').sum()

Unnamed: 0_level_0,CommodityId,Stock,CommodityStock,TotalStock
DistrictName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
BHAGALPUR,21,82345.61401,4871125.0,4871125.0
CHAPRA,18,504430.84159,4707717.0,4707717.0
DARBHANGA,21,270775.3033,4871125.0,4871125.0
GAYA,21,868084.40737,4871125.0,4871125.0
MOTIHARI,21,107960.6538,4871125.0,4871125.0
MUZAFFARPUR,18,413613.56734,4707717.0,4707717.0
PATNA,18,515692.54524,4707717.0,4707717.0
PURNIA,21,403565.01825,4871125.0,4871125.0
ROHTAS,21,908499.26714,4871125.0,4871125.0
SAHRSA,18,465483.86998,4707717.0,4707717.0


In [37]:
fci_Stock_Pos_Parboiled_Rice_Bihar.columns

Index(['Date', 'Code', 'CommodityId', 'CommodityName', 'DistrictName',
       'DistrictCode', 'Stock', 'CommodityStock', 'TotalStock'],
      dtype='object')

In [38]:
fci_Stock_Pos_Parboiled_Rice_Bihar[fci_Stock_Pos_Parboiled_Rice_Bihar['CommodityStock'] == fci_Stock_Pos_Parboiled_Rice_Bihar['TotalStock']]

Unnamed: 0,Date,Code,CommodityId,CommodityName,DistrictName,DistrictCode,Stock,CommodityStock,TotalStock
0,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,BHAGALPUR,EC12,6054.03340,670773.28615,670773.28615
1,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,DARBHANGA,EC13,21699.65130,670773.28615,670773.28615
2,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,GAYA,EC14,106308.35690,670773.28615,670773.28615
3,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,MUZAFFARPUR,EC15,81102.67909,670773.28615,670773.28615
4,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,PURNIA,EC16,58705.13805,670773.28615,670773.28615
...,...,...,...,...,...,...,...,...,...
67,2022-01-07T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,DARBHANGA,EC13,6716.86310,163408.06314,163408.06314
68,2022-01-07T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,GAYA,EC14,84482.61668,163408.06314,163408.06314
69,2022-01-07T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,PURNIA,EC16,12005.83255,163408.06314,163408.06314
70,2022-01-07T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,MOTIHARI,EC23,1952.12400,163408.06314,163408.06314


In [40]:
fci_Stock_Pos_Parboiled_Rice_Bihar.sort_values(by='Stock', ascending=False)

Unnamed: 0,Date,Code,CommodityId,CommodityName,DistrictName,DistrictCode,Stock,CommodityStock,TotalStock
21,2022-01-02T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,ROHTAS,EC25,155305.90190,704628.31665,704628.31665
10,2022-01-01T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,ROHTAS,EC25,155305.90190,670773.28615,670773.28615
60,2022-01-06T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,PATNA,EC17,152817.88167,856895.33853,856895.33853
32,2022-01-03T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,ROHTAS,EC25,148218.22190,801968.08190,801968.08190
35,2022-01-04T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,GAYA,EC14,147390.62308,858007.01939,858007.01939
...,...,...,...,...,...,...,...,...,...
11,2022-01-02T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,BHAGALPUR,EC12,5961.80640,704628.31665,704628.31665
66,2022-01-07T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,BHAGALPUR,EC12,5379.10307,163408.06314,163408.06314
31,2022-01-03T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,MOTIHARI,EC23,4920.22840,801968.08190,801968.08190
20,2022-01-02T00:00:00Z,Region Name: Bihar,3,Rice-Parboiled,MOTIHARI,EC23,4920.22840,704628.31665,704628.31665


In [41]:
import numpy as np
import pandas as pd
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df.head()

Unnamed: 0,Ord_id,Prod_id,Ship_id,Cust_id,Sales,Discount,Order_Quantity,Profit,Shipping_Cost,Product_Base_Margin
0,Ord_5446,Prod_16,SHP_7609,Cust_1818,136.81,0.01,23.0,-30.51,3.6,0.56
1,Ord_5406,Prod_13,SHP_7549,Cust_1818,42.27,0.01,13.0,4.56,0.93,0.54
2,Ord_5446,Prod_4,SHP_7610,Cust_1818,4701.69,0.0,26.0,1148.9,2.5,0.59
3,Ord_5456,Prod_6,SHP_7625,Cust_1818,2337.89,0.09,43.0,729.34,14.3,0.37
4,Ord_5485,Prod_17,SHP_7664,Cust_1818,4233.15,0.08,35.0,1219.87,26.3,0.38


In [42]:
df['Product_Base_Margin'].describe()

count    8290.000000
mean        0.512481
std         0.135560
min         0.350000
25%         0.380000
50%         0.520000
75%         0.590000
max         0.850000
Name: Product_Base_Margin, dtype: float64

In [44]:
df.isnull().sum()

Ord_id                   0
Prod_id                  0
Ship_id                  0
Cust_id                  0
Sales                   20
Discount                55
Order_Quantity          55
Profit                  55
Shipping_Cost           55
Product_Base_Margin    109
dtype: int64

In [45]:
#Type your code here for mean imputation
mean_value = df['Product_Base_Margin'].mean()
df.fillna({'Product_Base_Margin': mean_value}, inplace=True)
# print(round(#Type your code here for percentage of missing values))#Round off to 2 decimal places.

In [46]:
df.isnull().sum()

Ord_id                  0
Prod_id                 0
Ship_id                 0
Cust_id                 0
Sales                  20
Discount               55
Order_Quantity         55
Profit                 55
Shipping_Cost          55
Product_Base_Margin     0
dtype: int64

In [47]:
df.shape

(8399, 10)

In [48]:
round((df.isnull().sum()/len(df.index))*100, 2)

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.24
Discount               0.65
Order_Quantity         0.65
Profit                 0.65
Shipping_Cost          0.65
Product_Base_Margin    0.00
dtype: float64

In [49]:
ary = np.arange(1, 11*12+1).reshape(11,12)
ary

array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [ 13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24],
       [ 25,  26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36],
       [ 37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  47,  48],
       [ 49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72],
       [ 73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83,  84],
       [ 85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96],
       [ 97,  98,  99, 100, 101, 102, 103, 104, 105, 106, 107, 108],
       [109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120],
       [121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]])

In [50]:
np.unravel_index(100)

TypeError: unravel_index() missing required argument 'shape' (pos 2)

In [None]:
# Given array
a = np.array([[4, 3, 1], [5, 7, 0], [9, 9, 3], [8, 2, 4]])

# Read the values of m and n

m = int(input())
n = int(input())

# Write your code for swapping here
a[[0, 2]] = a[[2,0]]

# Print the array after swapping
print()
print(a)

In [None]:
n = int(input())

a = np.ones((n,n), dtype=int)
for row in np.arange(1, n-1):
    for col in np.arange(1, n-1):
        a[row, col]=0
print(a)

In [100]:
a= np.arange(1, 25).reshape(4,6)
a

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12],
       [13, 14, 15, 16, 17, 18],
       [19, 20, 21, 22, 23, 24]])

In [101]:
a[1:-1, 1:-1]

array([[ 8,  9, 10, 11],
       [14, 15, 16, 17]])

In [104]:
border_array= np.ones((5, 5), dtype=int)
border_array[1:-1, 1:-1]=0
print(border_array)

[[1 1 1 1 1]
 [1 0 0 0 1]
 [1 0 0 0 1]
 [1 0 0 0 1]
 [1 1 1 1 1]]
