Reading data from different data sources

In [52]:
# Reading data from CSV
import pandas as pd
from io import StringIO

# Create a string buffer
buffer = StringIO("""
Name, Age, City
John, 30, New York
Jane, 25, Los Angeles
Bob, 35, Chicago
""")

# Read the data from the buffer
df = pd.read_csv(buffer)
print(df)

   Name   Age          City
0  John    30      New York
1  Jane    25   Los Angeles
2   Bob    35       Chicago


In [53]:
import pandas as pd
from io import StringIO

# The problem with the original code is that pd.read_json() expects a JSON array 
# or a specific orientation to create a DataFrame. For a single JSON object 
# with scalar values, you must specify typ='series'.
Data = '{"name": "John", "age": 30, "city": "New York"}'    

output = pd.read_json(StringIO(Data), typ='series')
print(output)

name        John
age           30
city    New York
dtype: object


In [54]:
output.to_json()

'{"name":"John","age":30,"city":"New York"}'

In [55]:
output.to_json(orient='index')

'{"name":"John","age":30,"city":"New York"}'

In [56]:
output.to_json(orient='records')

'["John",30,"New York"]'

In [57]:
import pandas as pd
from io import StringIO
Data2 = '{"name": "John", "age": 30, "city": "New York", "job_profile":[{"title":"Software Engineer", "salary":5000}]}'    

output_data = pd.read_json(StringIO(Data2))
output_data

Unnamed: 0,name,age,city,job_profile
0,John,30,New York,"{'title': 'Software Engineer', 'salary': 5000}"


In [58]:
output_data.to_json()


'{"name":{"0":"John"},"age":{"0":30},"city":{"0":"New York"},"job_profile":{"0":{"title":"Software Engineer","salary":5000}}}'

In [59]:
output_data.to_json() # default orient is index

'{"name":{"0":"John"},"age":{"0":30},"city":{"0":"New York"},"job_profile":{"0":{"title":"Software Engineer","salary":5000}}}'

In [60]:
output_data.to_json(orient='index')

'{"0":{"name":"John","age":30,"city":"New York","job_profile":{"title":"Software Engineer","salary":5000}}}'

In [61]:
output_data.to_json(orient='records')

'[{"name":"John","age":30,"city":"New York","job_profile":{"title":"Software Engineer","salary":5000}}]'

In [62]:
dfd = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", header=None)
dfd.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [63]:
dfd.to_csv("wind.csv")

In [64]:
# https://www.fdic.gov/bank-failures/failed-bank-list
#bank_data = pd.read_csv('failed_bank_data.csv')

In [65]:
url="https://www.fdic.gov/bank-failures/failed-bank-list"
df = pd.read_html(url)
#print(df)
df[0]

Unnamed: 0,Bank Name,City,State,Cert,Acquiring Institution,Closing Date,Fund Sort ascending
0,Metropolitan Capital Bank & Trust,Chicago,Illinois,57488,First Independence Bank,"January 30, 2026",10550
1,The Santa Anna National Bank,Santa Anna,Texas,5520,Coleman County State Bank,"June 27, 2025",10549
2,Pulaski Savings Bank,Chicago,Illinois,28611,Millennium Bank,"January 17, 2025",10548
3,The First National Bank of Lindsay,Lindsay,Oklahoma,4134,First Bank & Trust Co.,"October 18, 2024",10547
4,Republic First Bank dba Republic Bank,Philadelphia,Pennsylvania,27332,"Fulton Bank, National Association","April 26, 2024",10546
5,Citizens Bank,Sac City,Iowa,8758,Iowa Trust & Savings Bank,"November 3, 2023",10545
6,Heartland Tri-State Bank,Elkhart,Kansas,25851,"Dream First Bank, N.A.","July 28, 2023",10544
7,First Republic Bank,San Francisco,California,59017,"JPMorgan Chase Bank, N.A.","May 1, 2023",10543
8,Signature Bank,New York,New York,57053,"Flagstar Bank, N.A.","March 12, 2023",10540
9,Silicon Valley Bank,Santa Clara,California,24735,First Citizens Bank & Trust Company,"March 10, 2023",10539


In [None]:
import pandas as pd
import requests
from io import StringIO
url = "https://en.wikipedia.org/wiki/Mobile_country_code"
'''The issue is that pd.read_html() in newer versions of Pandas needs the HTML string wrapped in StringIO. '''
# response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
# pd.read_html returns a list of DataFrames; you need to select the specific table (e.g., index 0)
tables = pd.read_html(StringIO(response.text))
print(tables[0])


   MCC  MNC Brand      Operator       Status Bands (MHz)  \
0    1    1  TEST  Test network  Operational         any   
1    1    1  TEST  Test network  Operational         any   
2  999   99   NaN  Internal use  Operational         any   
3  999  999   NaN  Internal use  Operational         any   

                              References and notes  
0                                              NaN  
1                                              NaN  
2  Internal use in private networks, no roaming[6]  
3  Internal use in private networks, no roaming[6]  


In [None]:
pd.read_excel('sample_data.xlsx') # Default first sheet

Unnamed: 0,Name,Age
0,Ravi,33
1,Shivam,35
2,Akash,36


In [71]:
excel = pd.read_excel('sample_data.xlsx', sheet_name='Sheet2') 

Pickle file is a binary file format used in Python to serialize and deserialize Python objects.  It converts complex Python objects—like lists, dictionaries, classes, or functions—into a byte stream that can be saved to disk, transmitted over a network, or stored in a database. This process is known as pickling (serialization).  Later, the byte stream can be restored to its original object form through unpickling (deserialization), preserving all attributes, methods, and relationships.

File extension: Typically .pkl or .pickle.
Binary format: Not human-readable; designed specifically for Python. 
Use cases: Saving trained machine learning models, caching computations, maintaining application state, or sharing data between Python programs.
Built-in module: Python’s pickle module provides functions like pickle.dump() to save and pickle.load() to load objects.
Security warning: Never unpickle data from untrusted sources, as it can execute arbitrary code during deserialization, posing a serious security risk. 

In [None]:
excel.to_pickle('df_pickle') # Serialization
pd.read_pickle('df_pickle') # Deserialization

Unnamed: 0,State,Capital
0,Madhya Pradesh,Bhopal
1,Uttar Pradesh,Lucknow
2,Chattisgargh,Raipur
3,Karnataka,Bangalore
