# Showcase 1 - Downloading Organization Data

This Jupyter Notebook demonstrates how to use APIs to download data from a specific organization and provider. The example uses the search functionality from the original website: [Saudi Open Data](https://open.data.gov.sa/en/publishers).

## Workflow Overview

1. **Import Necessary Modules**: Import the required functions from the `open_ksa` module.
2. **Search for Organizations**: Use the search function to find organizations based on a keyword.
3. **Retrieve Organization ID**: Extract the organization ID from the search results.
4. **Get Organization Resources**: Fetch all dataset IDs and other resources associated with the organization.
5. **Download Data**: Download the datasets to a specified directory.

### Steps 1 and 2

Here we showcase how the modules are imported and the organizations are found in the search in the process of discovery based on certain key word term.


In [1]:
#!pip uninstall open_ksa
#Here you can import all of the corresponding functions from the workbook
import open_ksa as ok

#An example on how to use the search function
orgs = ok.organizations(search="king saud university")
# len(orgs)


| publisherID                          | slug                 | Datasets |
-----------------------------------------------------------------------
| a9e617ff-d918-4f4d-8be1-c42b733b1143 | king_saud_university | 216      |


### Steps 3 - 5

Once you've discovered the organization you want to pull the data from, we can begin to assign the organization and similarly look for all of the associated resources and download them in the process to a local folder.

The folder path is usually assigned in the workspace where `python` is running and the allowed extensions specify the file types to download.

#### Variables

- **`dataset_ids`**: List of dataset IDs retrieved from the organization.
- **`ks`**: Organization ID.
- **`location`**: Directory path where the datasets will be saved.
- **`organization_id`**: Organization ID (same as `ks`).
- **`orgs`**: Dictionary containing search results for organizations.
- **`resources`**: Dictionary containing resources and dataset IDs for the organization.

This workbook provides a structured approach to interact with the API, search for organizations, retrieve relevant data, and download it for further analysis.


In [2]:
#Here, we grab the first value which is the value of the organization ID from the API
#Depending on the parameters, we can specify the return of the response

# We have now gotten the publisher ID programmatically. If you change the ID to a string of your choosing or decide to 
#change the search, you can change the orgs['content'][0]['publisherID'] to match your search and the index 0 to N to 
#to match the organization you want
download_stats = {}
for org in orgs['content']:
    resources = ok.get_org_resources(org_id=org['publisherID'])
    #Here, we grab all of the different dataset_ids
    if not resources: continue
    dataset_ids = resources['dataset_ids']
    #Here, we grab the organization ID as well. But we can use the same organization ID from the ks value
    # we named it ks for 'King Saud University'
    organization_id = resources['organization_id']
    location = f"opendata/{resources['organization_name'].strip().replace(' ', '_').lower()}"
    # Create a directory named after the organization ID
    # Get all of the data resources for the organization
    download_stats[org['publisherID']]=ok.get_dataset_resources(dataset_ids=dataset_ids, 
                            allowed_exts=["csv","json","xlsx","xls"],
                            ext_dir=True,
                            #verbose=False,
                            #You can update the dataset resource location to change the output directory
                            #Note: you may have to make the directory
                            output_dir=location
                            )


In [9]:
#download_stats
pass

list(download_stats)





['a9e617ff-d918-4f4d-8be1-c42b733b1143']

In [4]:
ministry_of_commerce = next((org for org in orgs['content'] if org['slug'] == 'king_saud_university'), None)
ministry_of_commerce


{'publisherID': 'a9e617ff-d918-4f4d-8be1-c42b733b1143',
 'slug': 'king_saud_university',
 'nameAr': 'جامعة الملك سعود',
 'nameEn': 'King Saud University',
 'descriptionAr': 'جامعة الملك سعود',
 'descriptionEn': 'King Saud University',
 'addressAr': None,
 'addressEn': None,
 'numberOfDatasets': 216,
 'logo': 'null',
 'websiteUrl': 'www.ksu.edu.sa',
 'type': 'GOVERNMENTAL',
 'status': 'PUBLISHED',
 'ratings': None,
 'rating': None,
 'createdAt': '2019-07-02',
 'updatedAt': None,
 'numberOfFollowers': 0}

In [5]:
import os
import pandas as pd

# List the files in the csv subdirectory of the ministry
csv_files = os.listdir(f"opendata/king_saud_university/csv")
print(csv_files)



# Load the first csv file and display the first 10 rows
selected_csv = csv_files[1]

sel_csv_path = f"opendata/king_saud_university/csv/{selected_csv}"
df = pd.read_csv(sel_csv_path)

print("Number of rows in the dataset: ", df.shape[0])

df.head(10)


df.info()




['Scholarship students 2024 csv.csv', 'data_of_students_enrolled_during_the_academic_year_1441ah.csv', 'data_of_activities_provided_to_students_during_the_year_of_1442.csv', 'ksu-dmo-od-dataset-communityservice-medcity-1437-438-ah.csv', 'data_of_scientific_publications_during_the_academic_year_1438-1439_ah.csv', 'data_of__faculty_members_during_the_academic_year__1442_ah.csv', 'ksu-dmo-od-dataset-sciresarch-publications-1433-1434-ah.csv', 'ksu-dmo-od-dataset-scholarships-graduates-1436-1437-ah.csv', 'ksu-dmo-od-dataset-scholarships-graduates-1442-ah.csv', 'data_of__employees__during_the_academic_year_1443_ah.csv', 'ksu-dmo-od-dataset-scholarships-graduates-1439-1440-ah.csv', 'data_of_university_medical_city_during_the_year_of__1441_ah.csv', 'ksu-dmo-od-dataset-communityservice-medcity-1436-437-ah.csv', 'data_of_graduate_students__during_the_academic_year_1438-1439_ah.csv', 'ksu-dmo-od-dataset-scholarships-graduates-1431-1432-ah.csv', 'ksu-dmo-od-dataset-scholarships-students-1442-ah.cs

In [6]:
#!pip uninstall open_ksa
#Here you can import all of the corresponding functions from the workbook
import open_ksa as ok

#An example on how to use the search function
orgs = ok.organizations(search="nuclear")

orgs['content'][0]







| publisherID                          | slug                                           | Datasets |
-------------------------------------------------------------------------------------------------
| 044b2cb2-db38-48ff-ae38-be9391f6c248 | nuclear-and-radiological-regulatory-commission | 1220     |


{'publisherID': '044b2cb2-db38-48ff-ae38-be9391f6c248',
 'slug': 'nuclear-and-radiological-regulatory-commission',
 'nameAr': 'هيئة الرقابة النووية والإشعاعية',
 'nameEn': 'Nuclear And Radiological Regulatory Commission',
 'descriptionAr': 'تنتشــر فــي المملكــة ومنــذ عقــود تطبيقــات مختلفـة للتقنيات الإشـعاعية فـي مجـالات عـدة، وتشــهد هــذه التطبيقــات نمــوا في مجالات حيويـة مثـل الطـب والصناعـة والبحـث والتعليـم . ويجري حالياً تمكين تقنيات الطاقة النووية للمساهمة في تنوع مصادر الطاقة الوطنية في المملكة حيث يتم العمل على إنشاء اول محطة طاقة نووية لتوليد الكهرباء في المملكة. وهذه الممارسات النووية والإشعاعية تتطلب دوراً رقابياً في منظومة وطنية مناسبه وفعالة لضمان تحقق الأمان و الحماية و تنظيم الأنشطة و الممارسات و المرافق لتلك الإستخدامات. كما أن المملكة تلتزم بدورها الرقابي على الصعيد الدولي من خلال إيفاءها بمتطلبات عدد من  الصكوك الدولية  التي تؤكد جميعها على إيجاد أنظمة وطنية رقابية  في الشأن النووي والإشعاعي  وتعيين هيئات وطنية مستقلة وممكّنة بإنفاذ هذه الأنظمة،  فقد صدر قرار 

In [10]:
ok.get_org_resources(org_id=orgs['content'][0]['publisherID'])


{'organization_name': 'nuclear and radiological regulatory commission',
 'organization_id': '044b2cb2-db38-48ff-ae38-be9391f6c248',
 'dataset_ids': ['c32c58d9-9505-466b-b68f-987ce347b14d',
  'ac6831a8-5c2d-42fb-add9-ee28a6539d06',
  'a089c6ae-4b05-48d6-9beb-95bd7d4f88f8',
  '0a492be1-f985-44fd-ab5b-47d2f731debb',
  '4c43c323-dcdb-4224-8293-26978459ee5c',
  '80f6e5e1-35d6-47a8-a243-15cbc5bbdcee',
  'a5a0b179-dc95-489e-be15-2d75393a1a78',
  '6783a7f8-b761-4e97-82c9-8eb3b6fee1b1',
  '598692db-4c22-405d-929c-639415f8f831',
  'f1eed9f7-91dd-4828-954a-38007f50c16c',
  '89deed5b-9e3c-4403-b0c9-5d33b72b3658',
  '279d81d9-8c86-4261-8c61-fa81ca210aae',
  '81e79275-88f1-49a2-872b-a1d0e195a504',
  '4d31f7d7-e9e8-4727-9162-34feb272f175',
  '1b36100b-9359-4bc6-9871-274806b80383',
  '48c598fd-b1d5-4d4e-9bf4-38bbe5fdac73',
  '4d39e916-f6ca-42da-87e4-983447f2649a',
  'c4491dac-fdcf-43a2-8512-e173b3883002',
  '54ac79a9-cdf0-4606-847e-a29e4180ac13',
  '5b5d4dc5-82fa-45b9-8a51-ff1b1cc59b7e',
  '9aeff664-5