This is script repository i created during my work. I include some documentation and image without adding the confidential files. DISCLAIMER: i am not using any actual data and removing sensitive data.
- Python 3.10
- Others are included in requirements.txt
git clone "https://github.com/mohgavin/code-repository"
cd code-repository
python3 -m venv .pyhton-3.10
source .python-3.10/bin/activate
pip3 install requirements.txt
forecast --> These are python script collection of forecast traffic/active user model algorithm (wwhich consist of : ARIMA, SARIMA, Holt-Winters, and Prophet). I am using Multi process (usage on core/thread) implementation in python helps to enable parallelization of forecasting so it can speed up the computation (Comparison : 7 Hours Single Threads vs 1 Hours With 12 Multi Core and Threads). Tested on Ryzen 5 3600 with no other processed work.
As shown in the graph below, there are an impact of holiday in active user in time series below either it is an increase trend or decrease of user traffic. And by using these model, we can forecast in near future on how big the impact of holiday or any other special day.
calculate ISD.ipynb --> Algorithm to get multiple nearest point/polygon in CRS 3857, Maximal distance is needed. The nearest_sjoin function from geopandas is not enough for my use case. Consider to refork the geopandas github and contribute to the library. I also created linestring to connect nearest dots to know its distance and display it in geographic information systems.
create Buffer Area.ipynb--> This scripts is to create buffer/polygon area around sites and used for limiting the samples to be inside and intersects with polygon.
create_grid.ipynb --> Algorithm to create custom rectangle grid in CRS 4326. These algorithm requires high memory and computation. Consider to revisit the algorithm in the future and simplified the steps.
dask-process_CR_XLArea.py --> Script collection to query and intersects point inside polygon of MRT Route of Senayan and Bundaran HI. These are meant to collect MR at underground levels.
big Query --> Script collection to query Big Query SQL from Cell Rebel Crowdsource. It requires JSON or Credential from Application Default Credential of Google Cloud to get the data. To get application default credential, you can access from here.
big_Data_Scripts --> Script collection to process Smartphone UE Location/ Signal Power / Throughput / Signal Quality / Measurement Report (duration : 1 month - 3 months, approximate : Hundred of Gb to Tb) from Raw data. These data have spatial data which can categorized into Space / Spatial Classification such as Province / Kelurahan /Kecamatan / Polygon level. Result sample are listed below. I use pandas, geopandas, dask, py-spark, sedona, airflow and combined to automatically process the data.
parsing --> Script Collection to get bad grid/cellID of signal/throughput from output of big_Data_Script. These requires distance from nearest sites or network element, and filter with condition only less than -105dBm or/and number of sample. The output can be configurable from mapinfo/vector files to csv/xlsx/json files.
database_process --> I created this code to have better display of showing radio equipment per sector. I build executable files in Windows to be worked in other computer and anyone can run the script without having to install environment in the computer.