<a href="https://colab.research.google.com/github/shiful133/r-python/blob/main/Python_Data_Import_Export.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Data Import-Export in Python**
Importing and exporting data in Google Colab, which is a popular platform for working with Python notebooks, involves using various libraries and methods to handle different file formats. Here's a brief guide on how to perform data import and export in Python Colab:



## Working Directory
In Google Colab, the working directory is set to the root directory by default. This root directory contains the Colab notebooks, and you can access files from your Google Drive as well. However, if you want to navigate and work within a specific directory, you can use the %cd magic command to change the current directory. Here's how you can get and change the working directory in Google Colab:

### Get Current Working Directory:

In [38]:
import os

# Get current working directory
current_directory = os.getcwd()
print("Current Directory:", current_directory)

Current Directory: /content


### Mounting Google Drive:
If you want to access files from your Google Drive or change working directory to a folder in Google Drive, you'll need to mount your Google Drive using the drive.mount() function.

In [2]:
from google.colab import drive

drive.mount('/content/drive')  # Mount Google Drive

Mounted at /content/drive


###Change Working Directory:

In [3]:
# Change working directory
new_directory = '/content/drive/MyDrive/Python'  # Replace with your desired directory path
%cd "$new_directory"


/content/drive/MyDrive/Python


### Check Files in the Directory:

In [None]:
# check files in any directory
os.listdir('/content/drive/MyDrive/Python')
# or just os.listdir() for current working directory

##Read/Import Data into Python:

### Reading CSV Files:

You can use the `pandas` library to read CSV files in Colab. If the file is hosted online, you can directly provide the URL. If the file is uploaded to Colab, you can use the file upload widget.

In [2]:
import pandas as pd

#### Read from Google Drive:

In [4]:
data_folder_drive = "/content/drive/MyDrive/data/"
test_data_csv = pd.read_csv(data_folder_drive + "test_data.csv")
test_data_csv.head()

Unnamed: 0,ID,treat,var,rep,PH,TN,PN,GW,ster,DTM,SW,GAs,STAs
0,1,Low As,BR01,1,84.0,28.3,27.7,35.7,20.5,126.0,28.4,0.762,14.6
1,2,Low As,BR01,2,111.7,34.0,30.0,58.1,14.8,119.0,36.7,0.722,10.77
2,3,Low As,BR01,3,102.3,27.7,24.0,44.6,5.8,119.7,32.9,0.858,12.69
3,4,Low As,BR06,1,118.0,23.3,19.7,46.4,20.3,119.0,40.0,1.053,18.23
4,5,Low As,BR06,2,115.3,16.7,12.3,19.9,32.3,120.0,28.2,1.13,13.72


#### Read from URL:

In [7]:
data_folder = "https://github.com/shiful133/data/raw/main/soil_data/"

test_data_csv = pd.read_csv(data_folder + "test_data.csv")
test_data_csv.head()

Unnamed: 0,ID,treat,var,rep,PH,TN,PN,GW,ster,DTM,SW,GAs,STAs
0,1,Low As,BR01,1,84.0,28.3,27.7,35.7,20.5,126.0,28.4,0.762,14.6
1,2,Low As,BR01,2,111.7,34.0,30.0,58.1,14.8,119.0,36.7,0.722,10.77
2,3,Low As,BR01,3,102.3,27.7,24.0,44.6,5.8,119.7,32.9,0.858,12.69
3,4,Low As,BR06,1,118.0,23.3,19.7,46.4,20.3,119.0,40.0,1.053,18.23
4,5,Low As,BR06,2,115.3,16.7,12.3,19.9,32.3,120.0,28.2,1.13,13.72


### Reading XLSX Files:

You can also use the pandas library to read XLSX files.

In [42]:
test_data_xlsx = pd.read_excel(data_folder + "test_data.xlsx") # data_folder variable defined in previous code block
# Get Column names of dataframe
column_names = test_data_xlsx.columns.tolist()
print(column_names)

['ID', 'treat', 'var', 'rep', 'PH', 'TN', 'PN', 'GW', 'ster', 'DTM', 'SW', 'GAs', 'STAs']


###Reading .txt Files:
You can use pandas to read TXT files. Reading tab separated, or comma separated txt file is very similar to reading CSV files. Since tab-separated values are essentially a form of delimited text, you can use the read_csv() function of pandas and specify the delimiter as a tab character.

In [34]:
test_data_txt = pd.read_csv(data_folder + "test_data.txt", delimiter='\t') # data_folder variable defined in previous code block
# Show first 5 rows for quick view with .head()
test_data_txt.head()

Unnamed: 0,ID,treat,var,rep,PH,TN,PN,GW,ster,DTM,SW,GAs,STAs
0,1,Low As,BR01,1,84.0,28.3,27.7,35.7,20.5,126.0,28.4,0.762,14.6
1,2,Low As,BR01,2,111.7,34.0,30.0,58.1,14.8,119.0,36.7,0.722,10.77
2,3,Low As,BR01,3,102.3,27.7,24.0,44.6,5.8,119.7,32.9,0.858,12.69
3,4,Low As,BR06,1,118.0,23.3,19.7,46.4,20.3,119.0,40.0,1.053,18.23
4,5,Low As,BR06,2,115.3,16.7,12.3,19.9,32.3,120.0,28.2,1.13,13.72


###Reading files via upload:

In [None]:
# Reading CSV from uploaded file
from google.colab import files
uploaded = files.upload()  # Upload the CSV file using the file upload widget

df_uploaded_csv = pd.read_csv(next(iter(uploaded)))

In [None]:
# Now you can work with df_uploaded_csv
df_uploaded_csv.head()

In [35]:
column_names = test_data_csv.columns.tolist()

print(column_names)

['ID', 'treat', 'var', 'rep', 'PH', 'TN', 'PN', 'GW', 'ster', 'DTM', 'SW', 'GAs', 'STAs']


### Reading JSON Files:
You can use the pandas library to read JSON files into a DataFrame.

In [46]:
test_data_json = pd.read_json(data_folder + "test_data.json")
# Get summary of the dataframe with .info()
test_data_json.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42 entries, 0 to 41
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   ID      42 non-null     int64  
 1   treat   42 non-null     object 
 2   var     42 non-null     object 
 3   rep     42 non-null     int64  
 4   PH      42 non-null     float64
 5   TN      42 non-null     float64
 6   PN      42 non-null     float64
 7   GW      42 non-null     float64
 8   ster    42 non-null     float64
 9   DTM     42 non-null     float64
 10  SW      42 non-null     float64
 11  GAs     42 non-null     float64
 12  STAs    42 non-null     float64
dtypes: float64(9), int64(2), object(2)
memory usage: 4.4+ KB


### Read Stata Data Files (.dta):
To read .dta files in Python, you can use the pandas library, which provides support for reading Stata data files. Stata data files have the .dta extension and are commonly used in econometrics and statistics. Here's how you can read a Stata data file using pandas:

In [8]:
test_data_dta = pd.read_stata(data_folder + "test_data.dta")
# Get summary of the dataframe with .info()
print(test_data_dta)

ValueError: ignored

The provided Stata data file is of version 110, but pandas supports importing versions 105, 108, 111, 113, 114, 115, 117, 118, and 119. Since version 110 is not directly supported by pandas, you might face some compatibility issues when trying to read it using the pd.read_stata() function.

There are some third-party libraries, like pyreadstat, which provide more comprehensive support for reading Stata files with various versions, including version 110.

In [None]:
!pip install pyreadstat

In [44]:
import pyreadstat
test_data_dta = pyreadstat.read_dta(data_folder_drive + "test_data.dta")
# Get summary of the dataframe with .info()
print(test_data_dta)

(    ID    treat        var  rep     PH    TN    PN    GW  ster    DTM     SW  \
0    1   Low As       BR01    1   84.0  28.3  27.7  35.7  20.5  126.0   28.4   
1    2   Low As       BR01    2  111.7  34.0  30.0  58.1  14.8  119.0   36.7   
2    3   Low As       BR01    3  102.3  27.7  24.0  44.6   5.8  119.7   32.9   
3    4   Low As       BR06    1  118.0  23.3  19.7  46.4  20.3  119.0   40.0   
4    5   Low As       BR06    2  115.3  16.7  12.3  19.9  32.3  120.0   28.2   
5    6   Low As       BR06    3  111.0  19.0  15.3  35.9  14.9  116.3   42.3   
6    7   Low As       BR28    1  114.3  21.7  19.3  56.2   6.1  123.7   35.4   
7    8   Low As       BR28    2  124.0  25.3  21.0  49.2   9.2  114.3   60.6   
8    9   Low As       BR28    3  120.3  23.0  19.0  48.6   4.2  113.3   69.8   
9   10   Low As       BR35    1  130.0  19.7  14.7  36.6  12.1  126.0   57.3   
10  11   Low As       BR35    2  133.3  21.0  16.3  39.9  11.5  130.7   53.0   
11  12   Low As       BR35    3  129.0 

In [41]:
import requests

# Download the file using requests
response = requests.get(data_folder + "test_data.txt")

if response.status_code == 200:
    # Extract the filename from the URL
    filename = file_url.split("/")[-1]
 # Save the content to a local file
    with open("/content/"+filename, "wb") as f:
        f.write(response.content)
    print(f"File '{filename}' downloaded successfully.")
else:
    print("Failed to download the file.")

File 'test_data.txt' downloaded successfully.
