<img src="images/scfhs-logo.png">

## Table of content:
> This notebook covers the following:
* [Analysis Introduction and Objective](#first-bullet)
* [Data Collection](#second-bullet)
* [Data Preprocessing and Exploration](#third-bullet)
* [Modeling](#fourth-bullet)
* [Results](#fifth-bullet)
* [Used Resources](#sixth-bullet)

### Analysis Introduction and Objective  <a class="anchor" id="first-bullet"></a>
As there is life and joy, there is also death and sadness. For those who lost their loved ones; my scincer condulances.<br> 
   
The objective of this analysis is to accept or reject Andrew's friend hypothesis  which states that "Some in the Muslim community think people tend to die more (often) in the month of Shaban."<br>

Using the from" as part of the pre-interview challenge for the Data Sceitist position. 

### Data Collection  <a class="anchor" id="fifth-bullet"></a>
To prsue this challenge we are going to use the data from [Madinah Municipality](https://services.amana-md.gov.sa/eservicesite/Inq/DeathInquiry.aspx).<br> 

We are going to collect the death data from 1435-01-01 to 1435-12-30 which is for a five year starting from 1435 till 1439. 


Unfortunately their website the seem to not have an API to gather the data which would had made things a lot easier. Thus we are going to scrape it from their website ourselfs, and for this we are going to use a python libirary caleld [Selenium](https://selenium-python.readthedocs.io/).<br>

Selenium is a powerfull automation tool that will allow us to automate the page navigation to scrape the inqury results table.

>**Note:**
    Please keep in mind that we are going to assume that this sample data represents the total Muslim population. 


We will start by importing the necessary libraries that we going to use in this notebook. 

In [6]:
# Import necessary libraries
import pandas as pd
import csv
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.select import Select
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException

Then, we will define **set_window_size()** function which will be used to change the dropdown values of the inquiry form. 

In [7]:
def set_window_size(driver, f_year, f_month, f_day,t_year, t_month, t_day):
    '''
    Takes the year, month and day values for both from and to dropdown items and sets their value.
    Input: webdriver, year, month and day.
    Output: no return, changes the dropdown values according to input values. 
    '''
    # From 
    from_year=Select(driver.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_cboYFrom"]')[0])
    from_year.select_by_visible_text(f_year)
    from_month=Select(driver.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_cboMFrom"]')[0])
    from_month.select_by_visible_text(f_month)
    from_day=Select(driver.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_cboDFrom"]')[0])
    from_day.select_by_visible_text(f_day)
    # To 
    to_year=Select(driver.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_cboYTo"]')[0])
    to_year.select_by_visible_text(t_year)
    to_month=Select(driver.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_cboMTo"]')[0])
    to_month.select_by_visible_text(t_month)  
    to_day=Select(driver.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_cboDTo"]')[0])
    to_day.select_by_visible_text(t_day)

We also going to need define **find()** method which will check if an element exists or not and will be used in waiting for element creation to avoid Stale Element Reference exception.

In [8]:
def find(driver):
    '''
    Takes a webdriver and checks if it exists or not. If it does, return element, if not return false
    Input: webdriver
    Output: Loaded element
    '''
    element = driver.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_dgDeath"]/tbody/tr[1]/td/a')
    if element:
        return element
    else:
        return False

We will now open **Chrome** browser, get the Madinah Municipality website and set the windows size for ouor query to be from 1435-01-01 to 1435-12-30. 

In [None]:
# Open up a Chrome browser and navigate to web page.
driver = webdriver.Chrome() 
driver.get("https://services.amana-md.gov.sa/eservicesite/Inq/DeathInquiry.aspx")

# Set data window size to be from 1435-01-01 to 1435-12-30
set_window_size(driver, '1435', '01', '01', '1439', '12', '30')

# Click on inquiry button 
inquiry_button = driver.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_btnSubmit"]')[0]
inquiry_button.click()

# Load pages links
page_links = driver.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_dgDeath"]/tbody/tr[1]/td/a')

Running the cell below will loop throgh the result pages and start scraping and saving the table data into a csv file called Death_inquiry_from_1435_to_1439. 

In [None]:
restart = True
first_time = True
table_data = []
temp_row=[]

while restart:
    for i in range(len(page_links)):# Loop through pages 
        
        # Parse the result table and append it to table_data list to write it into a csv file
        html = driver.page_source
        soup = BeautifulSoup(html)
        table_id="ctl00_ContentPlaceHolder1_dgDeath"
        for table in soup.findAll('table',{"id":table_id}):
            
            for tr in table.findAll('tr'): 
                    row = [td for td in tr.stripped_strings]
                    if not len(row)> 5 and 'تاريخ الدفن' not in row: # Skip page numbering and column text
                        table_data.append(row)
                    
                    
        # Wait for element creation to avoid Stale Element Reference exception
        element = WebDriverWait(driver, 10).until(find)
        try:
            # Click on the '...' to load the next batch of pages 
            if i == len(page_links)-1 and first_time:# the '...' xpath is different first time
                # Write data to file and clear the list for the next batch pages content 
                with open('Death_inquiry_from_1435_to_1439.csv', 'a', encoding="utf-8") as csvFile:
                    for row in table_data:
                        writer = csv.writer(csvFile)
                        writer.writerow(row)
                    csvFile.close()
                    table_data=[]
                try:
                    driver.find_element_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_dgDeath"]/tbody/tr[1]/td/a[20]').click()
                except NoSuchElementException:  #spelling error making this code not work as expected
                    print('This element no more exists')
                first_time=False 
            elif i == len(page_links)-1 and not first_time:
                # Write data to file and clear the list for the next batch pages content 
                with open('table_data.csv', 'a', encoding="utf-8") as csvFile:
                    for row in table_data:
                        writer = csv.writer(csvFile)
                        writer.writerow(row)
                    csvFile.close()
                    table_data=[]
                try:
                    driver.find_element_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_dgDeath"]/tbody/tr[1]/td/a[21]').click()
                except NoSuchElementException:  
                    print('This element no more exists')
            elif not element[i].text == '...':
                # Write data to file and clear the list for the next batch pages content 
                with open('table_data.csv', 'a', encoding="utf-8") as csvFile:
                    for row in table_data:
                        writer = csv.writer(csvFile)
                        writer.writerow(row)
                    csvFile.close()
                    table_data=[]
                    # Move to next page 
                    element[i].click()
            else:
                continue
        except TimeoutException as ex:
            print('Time out exception, waiting for 10 seconds')
            time.sleep(10)
            pass
        
    # Load pages links
    page_links = driver.find_elements_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_dgDeath"]/tbody/tr[1]/td/a')
     
    try:
        if not first_time:
            # if there are no more pages represented by the '...' symbol, stop the while loop 
            elem = driver.find_element_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_dgDeath"]/tbody/tr[1]/td/a[21]')
    except NoSuchElementException:  #spelling error making this code not work as expected
        print('No more pages!')
        restart = False

        
# Close the driver 
driver.close()