-----
# Geonames Description

The GeoNames geographical database covers all countries and contains over eleven million placenames that are available for download free of charge. <br>
Website URL: http://www.geonames.org/

The GeoNames geographical database is available for download free of charge under a creative commons attribution license. It contains over 25 million geographical names and consists of over 11 million unique features whereof 4.8 million populated places and 13 million alternate names. All features are categorized into one out of nine feature classes and further subcategorized into one out of 645 feature codes. 

The data is accessible free of charge through a number of webservices and a daily database export. GeoNames serves up to over 150 million web service requests per day.

GeoNames is integrating geographical data such as names of places in various languages, elevation, population and others from various sources. All lat/long coordinates are in WGS84 (World Geodetic System 1984). Users may manually edit, correct and add new names using a user friendly wiki interface.

### Data Sources

A comprehensive list of all data sources used to populate the placenames in every country and/or region throughout the world: <br>
* https://www.geonames.org/datasources/

### Client Libraries

A list of Client libraries to simplify access to the GeoNames web services exists in multiple programming languages:
* https://www.geonames.org/export/client-libraries.html

-----
# 1.0 - Setup Environment

In [None]:
print('\n-------------------------------------------------------------------------------------------------------------------------------')
print('1.0 - Setup Environment: \n')

"""
DESCRIPTION:
This section installs required Python packages. These are the versions used at time of build. This 
section should only be used if your system has not been previously set up. Consult the script 
author or your local Python specialist for initial system configuration.

INSTRUCTIONS:
- Add/Remove packages as needed to further customize script for your needs.
- Verify version of package needed.

"""

#-------------------------------------------------------------------------------------------------------------------------------
# Step 1.1 - Start Timing Metrics

import time
code_start = time.time()
cell_start = time.time()
from time import sleep
from datetime import date, datetime, timedelta
import sys


#-------------------------------------------------------------------------------------------------------------------------------
# Step 1.2 - Install Packages

print('  ↓ Installing packages...')

#--------------------------------------------------------
# Install Packages using pip:

!pip install humanize beautifulsoup4 googletrans textblob langdetect openpyxl humanize psutil


#--------------------------------------------------------
# Install Packages using conda:

# !conda install --yes --prefix {sys.prefix} -c conda-forge humanize beautifulsoup4 googletrans textblob langdetect humanize psutil


#-------------------------------------------------------------------------------------------------------------------------------
# Step 1.3 - Import Packages

print('  ↓ Importing packages...')

# Packages: System
import sys
import os
import os.path
from os import listdir
import gc
import copy
import shutil
import threading
import zipfile
import glob
from glob import glob
import openpyxl
import humanize
import psutil

# Packages: Computing
import re
import numpy as np
import pandas as pd
import humanize

# Packages: HTTP Requests
import requests
import urllib.request
from urllib.request import urlopen, urlretrieve
from urllib.error import HTTPError
from bs4 import BeautifulSoup

# Packages: Language Translation
import googletrans
from googletrans import Translator
from langdetect import detect
from textblob import TextBlob
from textblob.exceptions import NotTranslated

# Create a translator object
translator = Translator()


print('  ✓ SUCCESS: Environment setup complete! \n')

#-------------------------------------------------------------------------------------------------------------------------------
# System Settings

print("    - Virtual Memory Total:     ", humanize.naturalsize(psutil.virtual_memory().total))
print("    - Virtual Memory Used:      ", psutil.virtual_memory().percent,"%")
print("    - Virtual Memory Available: ", (psutil.virtual_memory().available * 100 / psutil.virtual_memory().total), "% \n")


#--------------------------------------------------------------------------------------------------
# Cell Runtime

cell_end = time.time()
seconds = cell_end - cell_start
minutes, seconds = divmod(seconds, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
weeks, days = divmod(days, 7)


#-------------------------------------------------------------------------------------------------------------------------------
print('>>> Process complete. Check for errors and continue.')
print("  - Runtime (d:h:m:s):  {:0>2}d,{:0>2}h,{:0>2}m,{:05.2f}s".format(int(days),int(hours), int(minutes), seconds))





-------------------------------------------------------------------------------------------------------------------------------
1.0 - Setup Environment: 

  ↓ Installing packages...
  ↓ Importing packages...
  ✓ SUCCESS: Environment setup complete! 

    - Virtual Memory Total:      27.4 GB
    - Virtual Memory Used:       32.4 %
    - Virtual Memory Available:  67.63925328864393 % 

>>> Process complete. Check for errors and continue.
  - Runtime (d:h:m:s):  00d,00h,00m,06.80s


-----
# 2.0 - Access Geonames Data

In [None]:
print('\n-------------------------------------------------------------------------------------------------------------------------------')
print('2.0 - Access Geonames Data: \n')

"""
DESCRIPTION:
This section access the Geonames Downloads page, saves all files, then unzips
and archives them to appropriate folders in preparation of data cleaning.

INSTRUCTIONS:
- Adjust URL and file name as needed.
- Run cell to download and extract data.

"""

cell_start = time.time()

#-------------------------------------------------------------------------------------------------------------------------------
# Step 2.1 - Set variables

print('  ↓ Setting variables...')

# Set the URL to webscrape
geonames_url = 'http://download.geonames.org/export/dump/'

# Set output directory
output_dir = '/home/jovyan/work/Geonames/_data/'

# Set desired file extensions for download (tuple format)
file_ext = ['.zip']

#-------------------------------------------------------------------------------------------------------------------------------
# Step 2.2 - Create output directories

if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    print("  ↓ Output directory created:     " , output_dir)         # Create directory if missing
else:    
    print("  ↓ Output directory exists:       " , output_dir)         # Print output directory path


#-------------------------------------------------------------------------------------------------------------------------------
# Step 2.3 - Run HTML Request

print('  ↓ Connecting to Geonames API...')

try:
    # Connect to the URL
    response = requests.get(geonames_url, allow_redirects=True)

    # Parse HTML and save to BeautifulSoup object
    soup = BeautifulSoup(response.text, "html.parser")
    
except:
    print('    ⃠   ERROR: Could not connect to source. Check inputs and retry. \n')
    pass

#-------------------------------------------------------------------------------------------------------------------------------
# Step 2.4 - Run a for loop through all 'a' tags to download the whole data set

print('  ↓ Downloading data. Please be patient... \n')

line_count = 1                                         # Variable to track what line you are on

for one_a_tag in soup.findAll('a'):          # The 'a' tags represent HTML links in the source code
    try:
        if line_count >= 10:                        # Code for .zip files starts at line 10 on source html
            link = one_a_tag['href']
            
            if link.endswith(tuple(file_ext)):
                download_url = geonames_url + link
                filename = os.path.join(output_dir, link.rsplit('/', 1)[-1])
                urllib.request.urlretrieve(download_url,'./_data/' + link[link.find('/dump/_data/') + 1:])
                
                time.sleep(1)                           # Pause the code for 1 second
        
        line_count +=1                                # Add 1 for next line
        
    except ValueError as e:
        print('    ⃠   ERROR: Could not retrieve data. Check inputs and retry. \n')
        print(e)
        continue

#--------------------------------------------------------------------------------------------------
# Cell Runtime

cell_end = time.time()
seconds = cell_end - cell_start
minutes, seconds = divmod(seconds, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
weeks, days = divmod(days, 7)


#-------------------------------------------------------------------------------------------------------------------------------
print('>>> Process complete. Check for errors and continue.')
print("  - Runtime (d:h:m:s):  {:0>2}d,{:0>2}h,{:0>2}m,{:05.2f}s".format(int(days),int(hours), int(minutes), seconds))




-------------------------------------------------------------------------------------------------------------------------------
2.0 - Access Geonames Data: 

  ↓ Setting variables...
  ↓ Output directory created:      /home/jovyan/work/Geonames/_data/
  ↓ Connecting to Geonames API...
  ↓ Downloading data. Please be patient... 

>>> Process complete. Check for errors and continue.
  - Runtime (d:h:m:s):  00d,00h,09m,03.95s


-----
# 3.0 - Organize Geonames Data

In [None]:
print('\n-------------------------------------------------------------------------------------------------------------------------------')
print('3.0 - Organize Geonames Data: \n')

"""
DESCRIPTION:
This section extracts all zipfiles within the folder path to the same directory.

INSTRUCTIONS;
- Specify desired file path between quotation marks.
- Run cell to extract files.

"""

cell_start = time.time()

#-------------------------------------------------------------------------------------------------------------------------------
# Step 3.1 - Extract Data

output_dir = '/home/jovyan/work/Geonames/_data/'

print('Extract compressed data: ')
print('  ↓ Extracting .zip files to: ', output_dir)

os.chdir(output_dir)
for item in os.listdir(output_dir):                 # Loop through items in output_dir
    if item.endswith(".zip"):                          # Check for ".zip" extension
        file_name = os.path.abspath(item)      # Get full path of file
        zip_ref = zipfile.ZipFile(file_name)   # Create zipfile object
        zip_ref.extractall(output_dir)              # Extract file to dir
        zip_ref.close()                                     # Close file

print('  ✓ SUCCESS: Extraction complete! \n')


#-------------------------------------------------------------------------------------------------------------------------------
# Step 3.2 - Create Directory Structure

print('Create output directories: ')

# Create target directory paths
target_dir1 = output_dir + '_archive/'
target_dir2 = output_dir + '_csv/'
target_dir3 = output_dir + '_txt/'
target_dir4 = output_dir + '_xlsx/'
target_dir5 = output_dir + '_zip/'
target_dir6 = output_dir + '_errors/'

#------------------------------------------------------------------------------------
if not os.path.exists(target_dir1):                # Create directory if missing
    os.mkdir(target_dir1)
    print("  ↓ Directory created: " , target_dir1)
else:    
    print("  ↓ Directory exists:  " , target_dir1)

#------------------------------------------------------------------------------------
if not os.path.exists(target_dir2):                # Create directory if missing
    os.mkdir(target_dir2)
    print("  ↓ Directory created: " , target_dir2)
else:    
    print("  ↓ Directory exists:  " , target_dir2)

#------------------------------------------------------------------------------------
if not os.path.exists(target_dir3):                # Create directory if missing
    os.mkdir(target_dir3)
    print("  ↓ Directory created: " , target_dir3)
else:    
    print("  ↓ Directory exists:  " , target_dir3)

#------------------------------------------------------------------------------------
if not os.path.exists(target_dir4):                # Create directory if missing
    os.mkdir(target_dir4)
    print("  ↓ Directory created: " , target_dir4)
else:    
    print("  ↓ Directory exists:  " , target_dir4)

#------------------------------------------------------------------------------------
if not os.path.exists(target_dir5):                # Create directory if missing
    os.mkdir(target_dir5)
    print("  ↓ Directory created: " , target_dir5)
else:    
    print("  ↓ Directory exists:  " , target_dir5)

#------------------------------------------------------------------------------------
if not os.path.exists(target_dir6):                # Create directory if missing
    os.mkdir(target_dir6)
    print("  ↓ Directory created: " , target_dir6)
else:    
    print("  ↓ Directory exists:  " , target_dir6)


print('  ✓ SUCCESS: Directory structure complete! \n')


#-------------------------------------------------------------------------------------------------------------------------------
# Step 3.3 - Archive unneeeded files

print('Archive unneeded files:')
print('  ↓ Transferring files to: ', target_dir1)

file_list = ['adminCode5.txt', 'allCountries.txt','alternateNames.txt','alternateNamesV2.txt',
             'cities500.txt', 'cities1000.txt','cities5000.txt','cities15000.txt',
             'hierarchy.txt','iso-languagecodes.txt','no-country.txt','readme.txt',
             'shapes_all_low.txt','shapes_simplified_low.json','userTags.txt']

for item in os.listdir(output_dir):                                                            # Loop through items in output_dir
    if item in file_list:                                                                                # Loop through items in file_list
        dst_file = os.path.join(target_dir1, os.path.basename(item))          # Set destination path
        shutil.move(item, dst_file)                                                              # Move files to destination path
    
    else:
        # print('    ⃠   ERROR: Could not archive data. Check inputs and retry. ', item, '\n')
        continue

print('  ✓ SUCCESS: File archive complete! \n')

#-------------------------------------------------------------------------------------------------------------------------------
# Step 3.4 - Move Files to Target Directories

print('Organize .zip files: ')

#------------------------------------------------------------------------

print('  ↓ Transferring .zip files to: ', target_dir5, "\n")

# Set desired file extensions (tuple format)
file_zip = ['.zip']

for item in os.listdir(output_dir):                                                            # Loop through items in output_dir
    
    if item.endswith(tuple(file_zip)):                                                       # Check for ".zip" extension
        dst_file = os.path.join(target_dir5, os.path.basename(item))         # Set destination path
        shutil.move(item, dst_file)                                                             # Move files to destination path
    
    else:
        # print('    ⃠   ERROR: Could not transfer data. Check inputs and retry.', item, ' \n')
        continue

#------------------------------------------------------------------------

# print('  ↓ Transferring .txt files to: ', target_dir3)

# # Set desired file extensions (tuple format)
# file_txt = ['.txt']

# for item in os.listdir(output_dir):                                                            # Loop through items in output_dir
    
#     if item.endswith(tuple(file_txt)):                                                       # Check for ".zip" extension
#         dst_file = os.path.join(target_dir3, os.path.basename(item))         # Set destination path
#         shutil.move(item, dst_file)                                                             # Move files to destination path
    
#     else:
#         # print('    ⃠   ERROR: Could not transfer data. Check inputs and retry.', item, ' \n')
#         continue

# print(" ")

#--------------------------------------------------------------------------------------------------
# Cell Runtime

cell_end = time.time()
seconds = cell_end - cell_start
minutes, seconds = divmod(seconds, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
weeks, days = divmod(days, 7)


#-------------------------------------------------------------------------------------------------------------------------------
print('>>> Process complete. Check for errors and continue.')
print("  - Runtime (d:h:m:s):  {:0>2}d,{:0>2}h,{:0>2}m,{:05.2f}s".format(int(days),int(hours), int(minutes), seconds))



-------------------------------------------------------------------------------------------------------------------------------
3.0 - Organize Geonames Data: 

Extract compressed data: 
  ↓ Extracting .zip files to:  /home/jovyan/work/Geonames/_data/
  ✓ SUCCESS: Extraction complete! 

Create output directories: 
  ↓ Directory created:  /home/jovyan/work/Geonames/_data/_archive/
  ↓ Directory created:  /home/jovyan/work/Geonames/_data/_csv/
  ↓ Directory created:  /home/jovyan/work/Geonames/_data/_txt/
  ↓ Directory created:  /home/jovyan/work/Geonames/_data/_xlsx/
  ↓ Directory created:  /home/jovyan/work/Geonames/_data/_zip/
  ↓ Directory created:  /home/jovyan/work/Geonames/_data/_errors/
  ✓ SUCCESS: Directory structure complete! 

Archive unneeded files:
  ↓ Transferring files to:  /home/jovyan/work/Geonames/_data/_archive/
  ✓ SUCCESS: File archive complete! 

Organize .zip files: 
  ↓ Transferring .zip files to:  /home/jovyan/work/Geonames/_data/_zip/
>>> Process complete. Chec

-----
# 4.0 - Process Source Data

In [None]:
print('\n-------------------------------------------------------------------------------------------------------------------------------')
print('4.0 - Process Source Data: \n')

"""
DESCRIPTION:
This section reads user defined input files into 'pandas' dataframes.
This process structures the data to allow for efficient editing and merging in later steps.

INSTRUCTIONS:
- Customize your input paths between quotation marks with a forward slash.
- Data inputs are read into 'pandas' dataframes and these frames can be viewed once read into memory
- Update data variables prior to running cell.

"""
cell_start = time.time()

#--------------------------------------------------------------------------------------------------
# Step 4.1 - Specify Geonames Input Files/Paths

#---------------------------------------------
# Set output directory
output_dir = '/home/jovyan/work/Geonames/_data/'

#---------------------------------------------
# Set target directories
target_dir1 = output_dir + "_archive/"
target_dir2 = output_dir + "_csv/"
target_dir3 = output_dir + "_txt/"
target_dir4 = output_dir + "_xlsx/"
target_dir5 = output_dir + "_zip/"
target_dir6 = output_dir + "_errors/"


#--------------------------------------------------------------------------------------------------
# Step 4.2 - Loop to Import and Process Country Files

for txt_file in glob(os.path.join(output_dir, '*.txt')):
    try:
        with open(txt_file) as txt_input:
            try:
                #-------------------------------------------------------------------------------------------
                # Import data: .txt file

                print('\n----------------------------------------------------------------------------------')
                print('Importing file:   ' + txt_file + "\n")
                
                country_file = txt_file[-6:]
                country_cc = country_file[:-4]
                
                print('  ↓ Country Code: ' + country_cc)

                #-------------------------------------------------------------------------------------------
                # Get File Size

                if os.path.exists(txt_file):
                    file_size = os.path.getsize(txt_file)
                    file_size = humanize.naturalsize(file_size)
                    print('  ↓ File Size:   ', file_size)
                    
                else:
                    print('   ⃠   ERROR: Input file does not exist. Check inputs and retry. \n')
                    pass
                
                
                #-------------------------------------------------------------------------------------------
                # Generate Dataframe
                
                print('  ↓ Generating dataframe... ')
                
                #---------------------------------------------
                # Define Data Types
                
                dtypes_dict = {'geonameid' : int,
                               'name' : str,
                               'asciiname' : str,
                               'alternatenames' : str,
                               'latitude' : float,
                               'longitude' : float,
                               'feature-class' : str,
                               'feature-code' : str,
                               'country-code' : str,
                               'alt-country-codes' : str,
                               'admin-code-1' : str,
                               'admin-code-2' : str,
                               'admin-code-3' : str,
                               'admin-code-4' : str,
                               'population' : int,
                               'elevation' : str,
                               'dem' : str,
                               'timezone' : str,
                               'modification-date' : str }

                #---------------------------------------------
                # Read File with Formatting
                
                df1 = pd.read_csv(txt_file, 
                                  delim_whitespace = False,
                                  sep = "\t", 
                                  names = ['geonameid','name','asciiname','alternatenames','latitude',
                                         'longitude','feature-class','feature-code','country-code',
                                         'alt-country-codes','admin-code-1','admin-code-2',
                                         'admin-code-3','admin-code-4','population',
                                         'elevation','dem','timezone','modification-date'], 
                                  dtype=dtypes_dict,
                                  encoding='utf-8-sig',
                                  error_bad_lines=False,
                                  skiprows=0,
                                  na_values=['none'],
                                  low_memory=False)

                #-------------------------------------------------------------------------------------------
                # Clean Dataframe
                
                print('  ↓ Cleaning dataframe... ')
                
                try:
                    #-----------------------------------------------------------
                    # Clean data fields
                    
                    df1['alternatenames'] = df1['alternatenames'].str.replace(',',', ')
                    df1['alternatenames'] = df1['alternatenames'].str.replace('  ',' ')

                    #-----------------------------------------------------------
                    # Convert NaN values to blank string

                    df1 = df1.replace(np.nan, '', regex=True)

                    #-----------------------------------------------------------
                    # Reindex data
                    
                    df1_header = ['geonameid','name','alternatenames','asciiname','src_lang',
                                  'src_lang_id','latitude','longitude','feature-class','feature-code',
                                  'country-code','alt-country-codes','admin-code-1','admin-code-2',
                                  'admin-code-3','admin-code-4','population','elevation','dem',
                                  'timezone','modification-date']
                    df1 = df1.reindex(columns = df1_header)
                                    
                except ValueError as e:
                    print("   ⃠   ERROR: Could not clean data. Check inputs and retry. \n")
                    print(e)
                    pass
                
                #-------------------------------------------------------------------------------------------
                # Populate Language Fields
                
                print('  ↓ Extracting language...')

                try:
                    #-----------------------------------------------------------
                    # Calculating source language list
                    
                    lang_list = googletrans.LANGUAGES
                    
                    #-----------------------------------------------------------
                    # Calculating source language id
                    
                    df1['src_lang_id'] = ''
                    df1['src_lang_id'] = df1['name'].apply(detect)
                    
                    #-----------------------------------------------------------
                    # Calculating source language names
                    
                    df1['src_lang'] = ''
                    df1['src_lang'] = df1['src_lang_id'].map(lang_list)
                    df1['src_lang'] = df1['src_lang'].str.title()
                                        
                except ValueError as e:
                    print("   ⃠   ERROR: Could not extract language data. Check inputs and retry. \n")
                    print(e)
                    pass

                #-------------------------------------------------------------------------------------------
                # Create Translated Copy of Dataframe
                
                print('  ↓ Create translated dataframe... ')
                
                try:
                    #-----------------------------------------------------------
                    # Create copy of dataframe
                    df2 = df1.copy()

                    #-----------------------------------------------------------
                    # Reindex fields
                                        
                    df2_header = ['name', 'name-chinese']
                    df2 = df2.reindex(columns = df2_header)
                                    
                except ValueError as e:
                    print("   ⃠   ERROR: Could not copy data. Check inputs and retry. \n")
                    print(e)
                    pass

                
                #-------------------------------------------------------------------------------------------
                # Export Dataframe

                print('  ↓ Export dataframes to .csv format in: ', target_dir2)
                
                try:
                    df1_csv = target_dir2 + country_cc + '.csv'
                    df2_csv = target_dir2 + country_cc + '_edit.csv'

                    df1.to_csv(df1_csv, sep=',', encoding='utf-8-sig', index=False)
                    df2.to_csv(df2_csv, sep=',', encoding='utf-8-sig', index=False)

                except ValueError as e:
                    print('   ⃠   ERROR: Export unsuccessful. Check inputs and retry. \n')
                    print(e)
                    pass
                
                
                #--------------------------------------------------
                # Export Dataframe to .xls
                
                print('  ↓ Export dataframes to .xls format in: ', target_dir4)
                
                try:
                    df1_xlsx = target_dir4 + country_cc + '.xlsx'
                    df2_xlsx = target_dir4 + country_cc + '_edit.xlsx'

                    df1.to_excel(df1_xlsx, encoding='utf-8-sig', index=False)
                    df2.to_excel(df2_xlsx, encoding='utf-8-sig', index=False)

                except ValueError as e:
                    print('   ⃠   ERROR: Export unsuccessful. Check inputs and try again. \n')
                    print(e)
                    pass                
                
            except:
                print('   ⃠   ERROR: Country data could not be processed. Check inputs and retry.')
                
                #--------------------------------------------------
                # If error, move source .txt file to _errors folder

                src = str(output_dir) + str(country_file)
                dst = str(target_dir6) + str(country_file)
                shutil.move(src, dst)
                
                print('  * File moved to _errors folder for reprocessing. \n')
                
                pass


            #--------------------------------------------------------------------------------------------------
            # Step 4.9 - Archive File Once Processed

            print('  ↓ Archiving  Source File: ')

            try:
                src = str(output_dir) + str(country_file)
                dst = str(target_dir3) + str(country_file)
                shutil.move(src, dst)

            except:
                print('   ⃠   ERROR: Could not archive file. Check inputs and retry. \n')
                pass

        #--------------------------------------------------------------
       # Memory Management
       
        print('  ↓ Cleaning  memory... \n')

        try:
            
            del df1
            del df2
            gc.collect()
            df1 = pd.DataFrame()
            df2 = pd.DataFrame()
            
            print("    - Virtual Memory Available: ", (psutil.virtual_memory().available * 100 / psutil.virtual_memory().total), "% \n")

        except:
            print('   ⃠   ERROR: Could not clean memory. Check inputs and retry. \n')
            pass

        #--------------------------------------------------------------

        print('  ✓ SUCCESS: Country file processed!')
        
        #--------------------------------------------------------------
        # Pause Code to Stagger API Call (in seconds)
        
        time.sleep(3.0)
    
    #--------------------------------------------------------------
    # Continue Loop
    
    except ValueError as e:
        print(e)
        continue
    
#--------------------------------------------------------------------------------------------------
# Cell Runtime

cell_end = time.time()
seconds = cell_end - cell_start
minutes, seconds = divmod(seconds, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
weeks, days = divmod(days, 7)


#-------------------------------------------------------------------------------------------------------------------------------
print('>>> Process complete. Check for errors and continue.')
print("  - Runtime (d:h:m:s):  {:0>2}d,{:0>2}h,{:0>2}m,{:05.2f}s".format(int(days),int(hours), int(minutes), seconds))


KernelInterrupted: Execution interrupted by the Jupyter kernel.

-----
# 5.0 - Translate Placenames

In [None]:

# # NOT OPERATIONAL AT SCALE. BREAKS GOOGLE API.

# print('\n-------------------------------------------------------------------------------------------------------------------------------')
# print('5.0 - Translate Placenames: \n')

# """
# DESCRIPTION:
# This section translates the Geonames name field into Chinese (zh-cn).
# This process structures the data to allow for efficient editing and merging in later steps.

# INSTRUCTIONS:
# - Customize your input paths between quotation marks with a forward slash.
# - Data inputs are read into 'pandas' dataframes and these frames can be viewed once read into memory
# - Update data variables prior to running cell.

# """

# cell_start = time.time()

# #--------------------------------------------------------------------------------------------------
# # Prepare data

# # Create copy of dataframe
# print('  ↓ Copying dataframe...')
# df3 = df2.copy()


# # Create field for translated language
# print('  ↓ Creating translated name field...')
# df3['name-chinese'] = ''

# #--------------------------------------------------------------------------------------------------
# # Identify source language

# print('  ↓ Translating name field...')
# translatedList = []
# for index, row in df3.iterrows():
    
#     # REINITIALIZE THE GOOGLETRANS API
#     from googletrans import Translator
# #     translator = Translator()
#     translator = Translator(service_urls = None, user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)', proxies = None, timeout = None)
#     newrow = copy.deepcopy(row)

#     try:
#         # Translate the 'name' column
#         translated = translator.translate(row['name'], dest='zh-cn')
#         df3['name-chinese'] = translated.text

#     except Exception as e:
#         print(str(e))
#         continue
#     translatedList.append(newrow)
#     time.sleep(1.5)                # Pause the code for (seconds)



# df3.head(10)

# #--------------------------------------------------------------------------------------------------
# # Cell Runtime

# cell_end = time.time()
# seconds = cell_end - cell_start
# minutes, seconds = divmod(seconds, 60)
# hours, minutes = divmod(minutes, 60)
# days, hours = divmod(hours, 24)
# weeks, days = divmod(days, 7)


# #-------------------------------------------------------------------------------------------------------------------------------
# print('>>> Process complete. Check for errors and continue.')
# print("  - Runtime (d:h:m:s):  {:0>2}d,{:0>2}h,{:0>2}m,{:05.2f}s".format(int(days),int(hours), int(minutes), seconds))


-----
# Code Metrics

In [None]:
print('\n-------------------------------------------------------------------------------------------------------------------------------')
print('Code Metrics: \n \n')

"""
DESCRIPTION:
This section calculates total processing time for entire code to complete.

INSTRUCTIONS:
- Run cell to generate and display metrics.

"""

code_end = time.time()
seconds = code_end - code_start
minutes, seconds = divmod(seconds, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
weeks, days = divmod(days, 7)


#------------------------------------------------------------------------------------------------------------------------------------
print("  >>> Runtime (d:h:m:s):  {:0>2}:{:0>2}:{:0>2}:{:05.2f}".format(int(days),int(hours), int(minutes), seconds))
print("  >>> Total Runtime:     ", int(days), "days,", int(hours), "hours,", int(minutes), "minutes,", int(seconds), "seconds \n \n")


print('-------------------------------------------------------------------------------------')
print('      Code complete. Check output for data accuracy and any errors.')
print('-------------------------------------------------------------------------------------')
      