# DNDS6288 - Scientific Python 2024/25 Fall

## Final project: Annual Analysis of Criminal Court Cases in Uzbekistan.
Done by Shohruh Sadullaev.

This project conducts a comprehensive analysis of all criminal cases in Uzbekistan from January 2023 to December 2023. Official website of judiciary system (https://public.sud.uz/report/CRIMINAL) contains over 69,000 public cases from 2023, categorized by court type, instance, relevant article (from criminal code), name of judge, date and region. This analysis will focus on identifying trends in the number of cases over time, exploring regional variations, and examining the types of courts handling these cases. Additionally, I will conduct a detailed breakdown of the most common crime categories, including bribery, theft, fraud, domestic violence and other offences. By using statistical and visual analysis, this project will uncover patterns in the criminal justice system and highlight any significant findings related to case outcomes, crime types, and geographic differences across the country.


### **Research questions:**

#### **1. Case characteristics**
   - What are the most common types of criminal cases (e.g., theft, assault, drug-related offenses) within this year, and what are their respective frequencies?

#### **2. Temporal factors**
   - Are there spikes in certain types of crimes around holidays or major public events, and if so, which crimes are most affected?
   - Are there certain types of crimes that tend to occur more frequently in particular months or seasons within the year?
   - Does the crime rate follow any discernible pattern on weekly basis within the year?

#### **3. Geographic factors**
   - How are different types of crimes geographically distributed, and are certain regions more prone to specific crimes?


## Part 1. Data Description

The judicial system in the Republic of Uzbekistan operates independently of the legislative and executive authorities, political parties, and other public associations. The judicial system consists of the Constitutional Court of the Republic of Uzbekistan, the Supreme Court of the Republic of Uzbekistan; military courts; regional (incl. Tashkent city) courts for civil and criminal cases; inter-district/district/city courts for civil and criminal cases; The Supreme Economic Court of the Republic of Uzbekistan, the Economic Court of the Republic of Karakalpakstan, regional and Tashkent city economic courts. The term of office of judges of civil, military and economic courts is five years. 


In August 2021, Supreme Court announced an opening of a web database with court decisions that have entered into legal force. These decisions are published on the website of the Supreme Court -  a centralized database (https://public.sud.uz/report). It was aimed to ensure transparency of courts, make court decisions available to the public, increase the legal literacy of the population, and prevent offenses. Publications are mandatory for decisions of economic courts, but not criminal and civil cases. Content of the court decisions are published depending on the consent of the parties either with full text or text with no personal details of parties. However, decisions on closed court sessions are not posted on the Internet. 


For this analysis, I will be covering only court decisions from criminal courts. Specifically, criminal cases in Supreme Court, regional criminal courts, and district criminal courts. Using ethical web scraping, I will retrieve all data from criminal courts for 2023 (1 Jan to 31 Dec). After detailed website inspection, I found URL for direct request to backend: https://publication.sud.uz/criminal/findAll?size=500&page={page_number}&startDate={start_date}&endDate={end_date}.

Dataset has following variables: claimId (unique id generated by database), dbName (full name of court with region/district and type), caseNumber (unique id used by judicial system), claimArticles (contains articles from legal code applied to the case), judge (full name of judge), hearingDate (date when court session was conducted), instance (1:'Биринчи инстанция'/'First instance', 2:'Апелляция инстанцияси'/'Appeal', 3: 'Тафтиш'/'Inspection', 4:'Кассация инстанцияси'/'Cassation, aka highest instance'), claimDocumentType (verdict: guilty, not guilty, reconcilliation, etc.).

For further analysis, I introduced 3 new variables based on dbName: courtType (district, regional, supreme), region (12 regions, 1 republic and capital city), district (220 units).

## Part 2. Data Collection

**Step 1:** Import the relevant libraries

In [None]:
import requests
import csv
import json
import time

: 

**Step 2:** Function to scrape data from a single page (from 1-Jan-2023 to 31-Dec-2023)

In [None]:
def scrape_page(page_number):
    url = f'https://publication.sud.uz/criminal/findAll?size=500&page={page_number}&startDate=1672527600000&endDate=1703977200000'
    response = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36", "Accept": "application/json"})
    if response.status_code == 200:
        data = response.json()
        data = json.loads(data['data'])
        return data['content']
    else:
        print(f"Failed to fetch data from page {page_number}. Status code: {response.status_code}")
        return []

**Step 3:** Function to convert JSON data to CSV format

In [None]:
def convert_to_csv(data, csv_file):
    with open(csv_file, 'w', newline='', encoding='utf-8') as csvfile:
        fieldnames = ["id", "alteredId", "claimId", 'instance', "dbName", "caseNumber", "claimArticles", 'claimDocumentType', "judge", "hearingDate"]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

        writer.writeheader()
        for item in data:
            writer.writerow({
                "id": item["id"],
                "alteredId": item["alteredId"],
                "claimId": item["claimId"],
                "instance": item["instance"],
                "dbName": item["dbName"],
                "caseNumber": item["caseNumber"],
                "claimArticles": '; '.join(item["claimArticles"],),
                "claimDocumentType": item["claimDocumentType"],
                "judge": item["judge"],
                "hearingDate": item["hearingDate"],
            })

**Step 4:** Scraping with 13-second delays (uncomment to run)

In [None]:
# Set the range of pages you want to scrape
start_page = 0
end_page = 140

# Time delay between requests in seconds 
delay_seconds = 13

# Loop through the pages and scrape data
all_data = []
for page_number in range(start_page, end_page + 1):
    page_data = scrape_page(page_number)
    all_data.extend(page_data)

    # Introduce a delay between requests to avoid overloading the server
    time.sleep(delay_seconds)

# Convert the scraped data to CSV
csv_filename = 'scraped_data' + str(start_page) + '-' + str(end_page) + '.csv'
convert_to_csv(all_data, csv_filename)

print(f"Scraping completed. Data saved to {csv_filename}")

## Part 3. Data Preprocessing

**Step 1:** Import modules

In [None]:
import pandas as pd
import plotly.express as px
from collections import Counter

**Step 2:** Load dataset

In [None]:
data = pd.read_csv('scraped_data0-140.csv')
data.drop_duplicates(subset=['claimId'], inplace=True)
data = data.drop(columns=['id','alteredId'])
data.head()

**Step 3:** Convert dates and sort by them

In [None]:
data['hearingDate'] = pd.to_datetime(data['hearingDate'], unit='ms').dt.date # Note: Consider making month-year for ethical purposes
data.sort_values(by='hearingDate', inplace=True)

**Step 4:** Print dataset information

In [None]:
print("Dataset information:")
data.info()

**Step 5:** Remove NaN

In [None]:
# Check for missing values
data.isnull().sum()

In [None]:
data.dropna(subset=['claimArticles'], inplace=True)
data.shape

**Step 6:** Define regional courts

In [None]:
regional_courts = [
    "Самарқанд вилояти суди",
    "Қашқадарё вилояти суди",
    "Тошкент вилояти суди",
    "Сурхондарё вилояти суди",
    "Тошкент шаҳар суди",
    "Сирдарё вилояти суди",
    "Андижон вилояти суди",
    "Фарғона вилояти суди",
    "Бухоро вилояти суди",
    "Наманган вилояти суди",
    "Қорақалпоғистон Республикаси суди",
    "Навоий вилояти суди",
    "Хоразм вилояти суди",
    "Жиззах вилояти суди"
]

**Step 7:** Function to determine the type of court (supreme, regional, district)

In [None]:
def determine_type(value):
    if value == 'Олий суд':
        return 'supreme'
    elif value in regional_courts:
        return 'regional'
    else:
        return 'district'

data['courtType'] = data['dbName'].apply(determine_type)

**Step 8:** Function to determine the instance

In [None]:
def determine_instance(value):
    if value == 1:
        return 'Биринчи инстанция'
    elif value == 2:
        return 'Апелляция инстанцияси'
    elif value == 3:
        return 'Тафтиш'
    elif value == 4:
        return 'Кассация инстанцияси'

data['instance'] = data['instance'].apply(determine_instance)

**Step 9:** Define regions and their districts

Uzbekistan is administratively divided into 12 regions, the capital city of Tashkent, and the Republic of Karakalpakstan (an autonomous entity with its own constitution and justice system). Each has a regional appellate court and district (first instance) courts. 

In [None]:
region = {
    'Republic of Uzbekistan': ['Олий суд'],
    'Samarqand Region': ['Самарқанд вилоят', 'Булунғур туман', 'Жомбой туман', 'Иштихон туман', 'Каттақўрғон туман', 'Каттақўрғон шаҳар', 'Қўшрабод туман', 'Нарпай туман', 'Нуробод туман', 'Оқдарё туман', 'Пайариқ туман', 'Пастдарғом туман', 'Пахтачи туман', 'Самарқанд туман', 'Самарқанд шаҳар', 'Тойлоқ туман', 'Ургут туман'],
    'Qashqadaryo Region': ['Қашқадарё вилоят', 'Ғузор туман', 'Деҳқонобод туман', 'Қамаши туман', 'Қарши туман', 'Қарши шаҳар', 'Касби туман', 'Китоб туман', 'Косон туман', 'Кўкдала туман', 'Миришкор туман', 'Муборак туман', 'Нишон туман', 'Чироқчи туман', 'Шаҳрисабз туман', 'Шахрисабз шаҳар', 'Яккабоғ туман'],
    'Tashkent Region': ['Тошкент вилоят', 'Ангрен шаҳар', 'Бекобод туман', 'Бекобод шаҳар', 'Бўка туман', 'Бўстонлиқ туман', 'Зангиота туман', 'Қибрай туман', 'Қуйи Чирчиқ туман', 'Нурафшон шаҳар', 'Оққўрғон туман', 'Олмалиқ шаҳар', 'Оҳангарон туман', 'Оҳангарон шаҳар', 'Паркент туман', 'Пискент туман', 'Тошкент туман', 'Ўрта Чирчиқ туман', 'Чиноз туман', 'Чирчиқ шаҳар', 'Юқори Чирчиқ туман', 'Янгийўл туман', 'Янгийўл шаҳар'],
    'Surxondaryo Region': ['Сурхондарё вилоят', 'Ангор туман', 'Бандихон туман', 'Бойсун туман', 'Денов туман', 'Жарқўрғон туман', 'Қизириқ туман', 'Қумқўрғон туман', 'Музработ туман', 'Олтинсой туман', 'Сариосиё туман', 'Термиз туман', 'Термиз шаҳар', 'Узун туман', 'Шеробод туман', 'Шўрчи туман'],
    'Tashkent City': ['Тошкент шаҳар', 'Бектемир туман', 'Мирзо Улуғбек туман', 'Миробод туман', 'Олмазор туман', 'Сергели туман', 'Учтепа туман', 'Чилонзор туман', 'Шайхантохур туман', 'Юнусобод туман', 'Яккасарой туман', 'Янгиҳаёт туман', 'Яшнобод туман'],
    'Sirdaryo Region': ['Сирдарё вилоят', 'Боёвут туман', 'Гулистон туман', 'Гулистон шаҳар', 'Мирзаобод туман', 'Оқолтин туман', 'Сайхунобод туман', 'Сардоба туман', 'Сирдарё туман', 'Ховос туман', 'Ширин шаҳар', 'Янгиер шаҳар'],
    'Andijan Region': ["Андижон вилоят", "Андижон туман", "Андижон шаҳар", "Асака туман", "Балиқчи туман", "Булоқбоши туман", "Бўстон туман", 'Бўз туман',"Жалақудуқ туман", "Избоскан туман", "Қўрғонтепа туман", "Марҳамат туман", "Олтинкўл туман", "Пахтаобод туман", "Улуғнор туман", "Хонобод шаҳар", "Хўжаобод туман", "Шаҳрихон туман"],
    'Fergana Region': ['Фарғона вилоят', 'Бешариқ туман', 'Боғдод туман', 'Бувайда туман', 'Данғара туман', 'Ёзъёвон туман', 'Қувасой шаҳар', 'Қува туман', 'Қўқон шаҳар', 'Қўштепа туман', 'Марғилон шаҳар', 'Олтиариқ туман', 'Риштон туман', 'Сўх туман', 'Тошлоқ туман', 'Учкўприк туман', 'Ўзбекистон туман', 'Фарғона туман', 'Фарғона шаҳар', 'Фурқат туман'],
    'Bukhara Region': ['Бухоро вилоят', 'Бухоро туман', 'Бухоро шаҳар', 'Вобкент туман', 'Ғиждувон туман', 'Жондор туман', 'Когон туман', 'Когон шаҳар', 'Қоракўл туман', 'Қоровулбозор туман', 'Олот туман', 'Пешку туман', 'Ромитан туман', 'Шофиркон туман'],
    'Namangan Region': ['Наманган вилоят', 'Косонсой туман', 'Мингбулоқ туман', 'Наманган туман', 'Наманган шаҳар', 'Норин туман', 'Поп туман', 'Тўрақўрғон туман', 'Уйчи туман', 'Учқўрғон туман', 'Чортоқ туман', 'Чуст туман', 'Янгиқўрғон туман'],
    'Republic of Karakalpakstan': ['Қорақалпоғистон Республикаси', 'Амударё туман', 'Беруний туман', 'Бўзатов туман', 'Кегейли туман', 'Қонликўл туман', 'Қораўзак туман', 'Қонликўл туман', 'Қўнғирот туман', 'Мўйноқ туман', 'Нукус туман', 'Нукус шаҳар', 'Тахиатош туман', 'Тахтакўпир туман', 'Тўрткўл туман', 'Хўжайли туман', 'Чимбой туман', 'Шуманай туман', 'Элликқалъа туман'],
    'Navoiy Region': ['Навоий вилоят', 'Зарафшон шаҳар', 'Кармана туман', 'Қизилтепа туман', 'Конимех туман', 'Навбаҳор туман', 'Навоий шаҳар', 'Нурота туман', 'Томди туман', 'Учқудуқ туман', 'Хатирчи туман'],
    'Xorazm Region': ['Хоразм вилоят', 'Боғот туман', 'Гурлан туман', 'Қўшкўпир туман', 'Тупроққалъа туман', 'Урганч туман', 'Урганч шаҳар', 'Хазорасп туман', 'Хива туман', 'Хива шаҳар', 'Хонқа туман', 'Шовот туман', 'Янгиариқ туман', 'Янгибозор туман'],
    'Jizzakh Region': ['Жиззах вилоят', 'Арнасой туман', 'Бахмал туман', 'Ғаллаорол туман', 'Дўстлик туман', 'Жиззах шаҳар', 'Зарбдор туман', 'Зафаробод туман', 'Зомин туман', 'Мирзачўл туман', 'Пахтакор туман', 'Фориш туман', 'Ш.Рашидов туман', 'Янгиобод туман']
}

**Step 10:** Find matching region/district

In [None]:
# Build a reverse lookup dictionary to map districts to regions
district_to_region = {v: key for key, values in region.items() for v in values}

# Function to find the matching region using the reverse lookup dictionary
def determine_region(value):
    for district in district_to_region:
        if district in value:
            return district_to_region[district]
    return None

# Function to find the matching district
def determine_district(value):
    for district in district_to_region:
        if district in value:
            return district
    return None


data['region'] = data['dbName'].apply(determine_region)
data['district'] = data['dbName'].apply(determine_district)

**Step 11:** Group articles (I manually grouped top 70 articles to assumed categories with translations from ChatGPT)

In [None]:
offenses = {
    "Economic Crimes": [  
        "Фирибгарлик",  # Fraud
        "Ўзлаштириш ёки растрата йўли билан талон-торож қилиш",  # Embezzlement or Misappropriation
        "Ўғрилик",  # Theft
        "Контрабанда",  # Smuggling
        "Ҳужжатлар, штамплар, муҳрлар, бланкалар тайёрлаш, уларни қалбакилаштириш, сотиш ёки улардан фойдаланиш",  # Forgery of Documents, Stamps, Seals
        "Солиқлар ёки бошқа мажбурий тўловларни тўлашдан бўйин товлаш",  # Tax Evasion
        "Валюта қимматликларини қонунга хилоф равишда олиш ёки ўтказиш",  # Smuggling Currency
        "Сохта тадбиркорлик",  # Illegal Entrepreneurial Activity
        "Жиноий фаолиятдан олинган даромадларни легаллаштириш",  # Money Laundering
        "Банд солинган мулкни қонунга хилоф равишда тасарруф этиш",  # Illegal Disposal of Pledged Property
        "Божхона тўғрисидаги қонун ҳужжатларини бузиш",  # Customs Violations
        "Савдо ёки хизмат кўрсатиш қоидаларини бузиш",  # Illegal Economic Activity
        "Этил спирти, алкоголли маҳсулот ва тамаки маҳсулотини қонунга хилоф равишда ишлаб чиқариш ёки муомалага киритиш",  # Illegal Production or Circulation of Alcohol and Tobacco
        "Қалбаки пул, акциз маркаси ёки қимматли қоғозлар ясаш, уларни ўтказиш",  # Counterfeit Money, Stamps, or Securities
        "Ҳужжатлар, штамплар, муҳрлар, бланкалар тайёрлаш, уларни қалбакилаштириш, сотиш ёки улардан фойдаланиш"  # Forgery of Documents, Stamps, Seals (Repeated)
    ],
    "Corruption and Abuse of Power": [  # Коррупция ва ваколатни суиистеъмол қилиш
        "Пора олиш",  # Bribery
        "Пора бериш",  # Giving Bribes
        "Ҳокимият ёки мансаб ваколатини суиистеъмол қилиш",  # Abuse of Power or Office
        "Мансабга совуққонлик билан қараш",  # Neglect of Duties
        "Мансаб сохтакорлиги",  # Office Fraud
        "Касб юзасидан ўз вазифаларини лозим даражада бажармаслик",  # Neglect of Professional Duties
        "Ҳокимият ёки мансаб ваколати доирасидан четга чиқиш"  # Misuse of Official Power
    ],
    "Crimes Against Individuals": [  # Шахсга қарши жиноятлар
        "Қасддан одам ўлдириш",  # Intentional Murder
        "Қасддан баданга енгил шикаст етказиш",  # Intentional Bodily Harm (Light)
        "Қасддан баданга ўртача оғир шикаст етказиш",  # Intentional Bodily Harm (Moderate)
        "Қасддан баданга оғир шикаст етказиш",  # Intentional Bodily Harm (Severe)
        "Зўрлик ишлатиб ғайриқонуний равишда озодликдан маҳрум қилиш",  # Illegal Detainment with Violence
        "Ўлдириш ёки зўрлик ишлатиш билан қўрқитиш",  # Threatening Murder or Violence
        "Эҳтиётсизлик орқасида баданга ўртача оғир ёки оғир шикаст етказиш",  # Bodily Harm by Negligence
        "Хавф остида қолдириш",  # Leaving in Danger
        "Ёлғон гувоҳлик бериш",  # Perjury
        "Туҳмат"  # Slander
    ],
    "Crimes Against Property": [  # Мулкка қарши жиноятлар
        "Босқинчилик",  # Robbery
        "Товламачилик",  # Extortion
        "Талончилик",  # Plundering
        "Ер участкаларини ўзбошимчалик билан эгаллаб олиш",  # Illegal Appropriation of Land
        "Транспорт воситасини олиб қочиш",  # Car Theft
        "Ҳужжатлар, штамплар, муҳрлар, бланкаларни, автомототранспорт воситаларининг ва улар тиркамаларининг (ярим тиркамаларининг) давлат рақам белгиларини эгаллаш, нобуд қилиш, уларга шикаст етказиш ёки уларни яшириш",  # Vandalism of Documents and License Plates
        "Банд солинган мулкни қонунга хилоф равишда тасарруф этиш"  # Destruction or Concealment of Seized Property
    ],
    "Drug-Related Crimes": [  # Наркотик моддаларга алоқадор жиноятлар
        "Гиёвандлик воситалари ёки психотроп моддаларни ўтказиш мақсадини кўзламай қонунга хилоф равишда тайёрлаш, эгаллаш, сақлаш ва бошқа ҳаракатлар",  # Illegal Drug Possession
        "Гиёвандлик воситалари ёки психотроп моддаларни ўтказиш мақсадини кўзлаб қонунга хилоф равишда тайёрлаш, олиш, сақлаш ва бошқа ҳаракатлар қилиш, шунингдек уларни қонунга хилоф равишда ўтказиш",  # Illegal Drug Trafficking
        "Тақиқланган экинларни етиштириш",  # Cultivating Prohibited Crops
        "Кучли таъсир қилувчи ёки заҳарли моддаларни қонунга хилоф равишда муомалага киритиш"  # Illegal Circulation of Potent or Toxic Substances
    ],
    "Crimes Against Public Safety and Order": [  # Жамоат хавфсизлиги ва тартибига қарши жиноятлар
        "Безорилик",  # Hooliganism
        "Жамоат хавфсизлиги ва жамоат тартибига таҳдид соладиган материалларни тайёрлаш, сақлаш, тарқатиш ёки намойиш этиш",  # Threat to Public Safety
        "Маъмурий назорат қоидаларини бузиш",  # Violating Public Supervision Rules
        "Қурол, ўқ-дорилар, портловчи моддалар ёки портлатиш қурилмаларига қонунга хилоф равишда эгалик қилиш",  # Illegal Possession of Weapons or Explosives
        "Ҳокимият вакилига ёки фуқаровий бурчини бажараётган шахсга қаршилик кўрсатиш",  # Resisting Authorities or Civilians Fulfilling Duties
        "Электр, иссиқлик энергияси, газ, водопроводдан фойдаланиш қоидаларини бузиш"  # Violating Utility Usage Rules (Electricity, Heat, Gas, Water)
        "Қўшмачилик қилиш ёки фоҳишахона сақлаш",  # Prostitution or Maintaining Brothels
        "Вояга етмаган шахсни ғайриижтимоий хатти-ҳаракатларга жалб қилиш",  # Engaging Minors in Anti-Social Behavior
        "Қимор ва таваккалчиликка асосланган бошқа ўйинларни ташкил этиш ҳамда ўтказиш"  # Organizing Gambling or Risk-Based Games
        "Жиноят ҳақида хабар бермаслик ёки уни яшириш"  # Failure to Report or Concealment of Crime
    ],
    "Crimes Against the State and Its Borders": [  # Давлатга қарши жиноятлар
        "Қонунга хилоф равишда чет элга чиқиш ёки Ўзбекистон Республикасига кириш",  # Illegal Crossing of State Borders
        "Ўзбекистон Республикасининг конституциявий тузумига тажовуз қилиш",  # Attack on Constitutional Order of Uzbekistan
        "Диний экстремистик, сепаратистик, фундаменталистик ёки бошқа тақиқланган ташкилотлар тузиш, уларга раҳбарлик қилиш, уларда иштирок этиш",  # Involvement in Extremist, Separatist, or Banned Organizations
    ],
    "Family-Related Crimes": [  # Оилавий
        "Вояга етмаган ёки меҳнатга лаёқатсиз шахсларни моддий таъминлашдан бўйин товлаш",  # Failure to Fulfill Obligations to Minors or Incapacitated Persons
        "Оилавий (маиший) зўравонлик"  # Domestic Violence
    ]
}

In [None]:
def determine_category(article_text):
    labels = []
    for category, keywords in offenses.items():
        for keyword in keywords:
            if keyword in article_text:
                labels.append(category)
                break  # Stop checking once we find a match for this category
    return '; '.join(labels) if labels else 'Other'

# Apply the function to each row
data['categories'] = data['claimArticles'].apply(determine_category)

## Part 4. EDA

**Step 1:** Analyze distribution by types of courts

In [None]:
types_frequency = Counter(data['courtType'])
types_frequency

**Step 2:** Analyze regional distribution

In [None]:
region_frequency = Counter(data['region'])
region_frequency

**Step 3:** Analyze district distribution

In [None]:
district_frequency = Counter(data['district'])
len(district_frequency)

**Step 4:** Analyze distribution by articles

In [None]:
articles = data['claimArticles'].apply(lambda x: [item.strip() for item in x.split(';')])
articles = [article for sublist in articles for article in sublist]
articles_frequency = Counter(articles)
len(articles_frequency)

**Step 5:** Analyze categories of articles

In [None]:
category_counts = Counter([category.strip() for sublist in data['categories'].str.split(';') for category in sublist])
category_counts

**Step 6:** Analyze distribution by judges

In [None]:
judges = data['judge'].apply(lambda x: x.strip())
judges_frequency = Counter(judges)
len(judges_frequency)

**Step 7:** Analyze distribution by instances

In [None]:
instance_frequency = Counter(data['instance'])
instance_frequency

**Step 8:** Analyze distribution by outcomes

In [None]:
claim_types = [eval(i) for i in data['claimDocumentType']]
claim_types = [article for sublist in claim_types for article in sublist]
claim_frequency = Counter(claim_types)
len(claim_frequency)

## Part 5. Visualization

### Visualization 1: By category

**Step 1:** Generate frequencies of categories

In [None]:
category_counts = Counter([category.strip() for sublist in data['categories'].str.split(';') for category in sublist])

**Step 2:** Sort frequencies of categories

In [None]:
category_counts = dict(sorted(category_counts.items(), key=lambda x: x[1], reverse=True))

**Step 3:** Create a plot of frequencies

In [None]:
fig = px.bar(x=category_counts.keys(), y=category_counts.values(), labels={'x': 'Categories', 'y': 'Total Frequency'},
             title='Total Frequency of Offenses by Category')

# Show the plot
fig.show()

#### Description:
- Purpose: This visualization displays the total frequency of various offense categories, highlighting which types of crimes are most prevalent.
- X-Axis: Categories of offenses (displayed at an angle for clarity)
- Y-Axis: Total Frequency of offenses in each category
- Bar Chart: Each bar represents the cumulative number of offenses within a specific category.

#### Key Insights:
The visualization shows that **Economic Crimes** are by far the most frequent category, significantly exceeding other types of offenses. This high frequency may reflect a focus on economic regulations and financial oversight. Other offense categories, such as **Crimes Against Individuals**, **Drug-Related Crimes**, and **Corruption and Abuse of Power**, also show notable counts but are much lower in comparison. Less frequent categories like **Crimes Against Public Safety and Order** and **Family-related Crimes** indicate a relatively lower incidence, suggesting that these offenses are either less common or are enforced and recorded differently. This distribution provides insight into the areas of legal focus and crime prevention priorities in the country.

It’s important to note that these categories were manually created by focusing on the top 70 articles with the highest frequencies, while articles with lower counts were excluded. As a result, the categories and their associated articles are not comprehensive and represent only a subset of offenses. This selection provides a focused view of common crime types but may omit less frequent offenses, which would appear in a more complete categorization.

### Visualization 2: By week

**Step 1:** Get dataframe without weekends

In [None]:
weekly_rate = data[pd.to_datetime(data['hearingDate']).dt.weekday < 5].copy()

**Step 2:** Expand 'categories' column by splitting and exploding

In [None]:
weekly_rate['categories'] = weekly_rate['categories'].str.split(';')
weekly_rate = weekly_rate.explode('categories')
weekly_rate['categories'] = weekly_rate['categories'].str.strip()  # Remove any extra spaces

**Step 3:** Group by day and categories

In [None]:
weekly_rate = weekly_rate.groupby(['hearingDate', 'categories']).size().reset_index(name='case_count')

**Step 4:** Define holidays https://www.officeholidays.com/countries/uzbekistan/2023

In [None]:
weekly_rate['hearingDate'] = pd.to_datetime(weekly_rate['hearingDate'])
holidays = [
    "2023-01-01", "2023-01-02", "2023-01-14", "2023-03-08", "2023-03-20", "2023-03-21", "2023-04-22", 
    "2023-05-09", "2023-06-01", "2023-06-29", "2023-09-01", "2023-10-01", "2023-12-08"
]
holidays = pd.to_datetime(holidays)

**Step 5:** Replace holiday dates with averages within each category

In [None]:
import math

def replace_holidays_with_average(df, holidays, date_column='hearingDate', value_column='case_count', category_column='categories'):
    # Iterate over each unique category in the specified category column
    for category in df[category_column].unique():
        # Filter the DataFrame for the current category
        df_category = df[df[category_column] == category]

        for holiday in holidays:
            if holiday in df_category[date_column].values:
                # Identify the previous and next non-holiday dates
                prev_date = holiday - pd.Timedelta(days=1)
                next_date = holiday + pd.Timedelta(days=1)

                # Search for the nearest non-holiday dates and their values within the same category
                while prev_date in holidays:
                    prev_date -= pd.Timedelta(days=1)
                while next_date in holidays:
                    next_date += pd.Timedelta(days=1)

                # Get the values for averaging within the same category
                prev_value = df_category.loc[df_category[date_column] == prev_date, value_column].values
                next_value = df_category.loc[df_category[date_column] == next_date, value_column].values

                # Calculate the average of non-holiday surrounding dates if both exist
                if prev_value.size > 0 and next_value.size > 0:
                    avg_value = (prev_value[0] + next_value[0]) / 2
                    avg_value = math.ceil(avg_value)
                    # Replace the holiday value with the average
                    df.loc[(df[date_column] == holiday) & (df[category_column] == category), value_column] = avg_value
                elif prev_value.size > 0:
                    # If only previous value is available
                    df.loc[(df[date_column] == holiday) & (df[category_column] == category), value_column] = prev_value[0]
                elif next_value.size > 0:
                    # If only next value is available
                    df.loc[(df[date_column] == holiday) & (df[category_column] == category), value_column] = next_value[0]

    return df

# Apply the function to replace holiday dates with averages within each category
weekly_rate = replace_holidays_with_average(weekly_rate, holidays)

**Step 6:** Create new column with week

In [None]:
weekly_rate['year-week'] = pd.to_datetime(weekly_rate['hearingDate']).dt.to_period('W').astype(str) #.dt.isocalendar().week

**Step 7:** Group by 'categories' and 'hearingDate' and count occurrences

In [None]:
weekly_rate = weekly_rate.groupby(['year-week', 'categories'])['case_count'].sum().reset_index()

**Step 8:** Plot

In [None]:
fig = px.line(
    weekly_rate,
    x='year-week',
    y='case_count',
    color='categories',
    labels={'year-week': 'Week', 'case_count': 'Number of Crimes', 'categories': 'Categories'},
    title='Weekly Court Case Rate by Category (No Weekends & Holidays)'
)

# Show the plot
fig.show()

#### Description:
- Purpose: This visualization aims to illustrate weekly fluctuations in court case rates across specific dates, offering a clear view of trends and variations with different crime types.
- X-Axis: Week
- Y-Axis: Number of Court Cases
- Variations:
  - Different crime categories

#### Key Insights:
The visualization shows weekly fluctuations in court cases by given crime categories, but there are no observable long-term trends, seasonal patterns, or cyclical variations. Since court cases are scheduled based on court availability and administrative processes, there’s no underlying trend in the data, and the rates remain stable over time apart from holiday-related fluctuations. The occasional drops in weekly case counts correspond to weeks with public holidays (1st January, 8th March, 21st March, 22nd April (Ramadan Eid), 9th May, 29th June (Kurban Eid), 1st October, 8th December). Even though weekends are removed and holidays are replaced with averages, I assume judges have relaxed schedule close to holidays. 

### Visualization 3: By region

**Step 1:** Get dataframe without Supreme Court


In [None]:
category_region_count = data[data['dbName']!='Олий суд'].copy()

**Step 2:** Expand 'categories' column by splitting and exploding

In [None]:
category_region_count['categories'] = category_region_count['categories'].str.split(';')
category_region_count = category_region_count.explode('categories')
category_region_count['categories'] = category_region_count['categories'].str.strip()  # Remove any extra spaces


**Step 3:** Group by 'categories' and 'hearingDate' and count occurrences

In [None]:
category_region_count = category_region_count.groupby(['categories', 'region']).size().reset_index(name='case_count')

**Step 4:** Sort by case_count

In [None]:
category_region_count.sort_values(by=['case_count'], ascending=False, inplace=True)

**Step 5:** Plot

In [None]:
# Create the absolute number plot
fig1 = px.bar(
    category_region_count,
    x='region',
    y='case_count',
    color='categories',
    labels={'region': 'Region', 'case_count': 'Number of Cases', 'categories': 'Categories'},
    title='Number of Cases per Category by Region (Absolute Numbers)'
)

# Normalize case counts within each region
category_region_count_normalized = category_region_count.copy()
category_region_count_normalized['case_count'] = category_region_count_normalized.groupby('region')['case_count'].transform(lambda x: x / x.sum())

# Create the normalized percentage plot
fig2 = px.bar(
    category_region_count_normalized,
    x='region',
    y='case_count',
    color='categories',
    labels={'region': 'Region', 'case_count': 'Proportion of Cases', 'categories': 'Categories'},
    title='Proportion of Cases per Category by Region (Normalized Percentages)',
    text_auto='.2%'
)

# Show both figures
fig1.show()
fig2.show()


#### Description:
- Purpose: This visualization displays the relative frequency/proportion of different crime types across different regions, providing insight into regional differences in case rates.
- X-Axis: Region (with region names displayed at an angle for readability)
- Y-Axis: Number/Proportion of cases 
- Each bar represents the number/proportion of cases per crime category in a specific region.


#### Key Insights:
The visualization shows that Tashkent City has the highest number of cases, significantly higher than other regions. This elevated rate may be influenced by the centralization of proceedings in the capital. Additionally, Tashkent serves as a hub for business and legal activities, hosting numerous company representatives and legal entities. This concentration of businesses and administrative bodies likely contributes to the higher case density in the capital. In contrast, regions like Karakalpakstan, Jizzakh Region, Sirdaryo Region, Navoiy Region, and Xorazm Region, which may reflect their population density, smaller urban centers, fewer businesses, and less centralized administrative functions compared to the capital.

Any dominance of specific crime type in specific region is not observable. It seems like proportions of crime types are almost identical among all regions. I was quite surprised to find this unnatural phenomenon. I assume that indicators normalize as numbers get larger.

## Part 6. Conclusion

In conclusion, this project has provided a detailed analysis of criminal court cases in Uzbekistan for the year 2023. Through careful data collection, preprocessing, and visualization, the study has revealed notable trends in crime types, regional variations, and temporal patterns within the country's judicial processes. 

Economic crimes emerged as the most frequently prosecuted category, highlighting a possible focus on financial oversight. Other prevalent categories were crimes against individuals and drug-related offenses. Temporal and regional analyses per categories revealed consistent court case rates across most regions and time periods, with few variations. Most of crime patterns diverged from the national trends. The only notable regional difference was in Tashkent City.

This data-driven approach not only underscores critical areas in criminal activity but also provides valuable insights for policymakers and law enforcement agencies aiming to allocate resources effectively and address crime prevention strategically. Future work could expand on this analysis by incorporating additional years or by analyzing the impact of legal reforms on crime rates, providing a richer understanding of Uzbekistan's criminal justice landscape.