#Codes from Mehdi G https://www.kaggle.com/servietsky/eazy-way-house-price-pycaret

A total of 1.5 million people died from TB in 2018 (including 251 000 people with HIV). Worldwide, TB is one of the top 10 causes of death and the leading cause from a single infectious agent (above HIV/AIDS).

In 2018, an estimated 10 million people fell ill with tuberculosis(TB) worldwide. 5.7 million men, 3.2 million women and 1.1 million children. There were cases in all countries and age groups. But TB is curable and preventable.

In 2018, 1.1 million children fell ill with TB globally, and there were 205 000 child deaths due to TB (including among children with HIV). Child and adolescent TB is often overlooked by health providers and can be difficult to diagnose and treat.

In 2018, the 30 high TB burden countries accounted for 87% of new TB cases. Eight countries account for two thirds of the total, with India leading the count, followed by, China, Indonesia, the Philippines, Pakistan, Nigeria, Bangladesh and South Africa.

Multidrug-resistant TB (MDR-TB) remains a public health crisis and a health security threat. WHO estimates that there were 484 000 new cases with resistance to rifampicin – the most effective first-line drug, of which 78% had MDR-TB.

Globally, TB incidence is falling at about 2% per year. This needs to accelerate to a 4–5% annual decline to reach the 2020 milestones of the End TB Strategy.

An estimated 58 million lives were saved through TB diagnosis and treatment between 2000 and 2018.
Ending the TB epidemic by 2030 is among the health targets of the Sustainable Development Goals.https://www.who.int/news-room/fact-sheets/detail/tuberculosis

![](https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcTtLxwVYzM4XGLOpQoBsWULo6Z-YHPECoNruQ&usqp=CAU)

In [None]:
!pip install pycaret

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from pycaret.regression import *
import numpy as np 
import pandas as pd 
from pandas_profiling import ProfileReport 
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

#Who is most at risk?

Tuberculosis mostly affects adults in their most productive years. However, all age groups are at risk. Over 95% of cases and deaths are in developing countries.

People who are infected with HIV are 19 times more likely to develop active TB. The risk of active TB is also greater in persons suffering from other conditions that impair the immune system. People with undernutrition are 3 times more at risk. Alcohol use disorder and tobacco smoking increase the risk of TB disease by a factor of 3.3 and 1.6, respectively.

#Global impact of TB

In 2018, 87% of new TB cases occurred in the 30 high TB burden countries. Eight countries accounted for two thirds of the new TB cases: India, China, Indonesia, Philippines, Pakistan, Nigeria, Bangladesh and South Africa.

#Symptoms and diagnosis

Common symptoms of active lung TB are cough with sputum and blood at times, chest pains, weakness, weight loss, fever and night sweats. Many countries still rely on a long-used method called sputum smear microscopy to diagnose TB. Trained laboratory technicians look at sputum samples under a microscope to see if TB bacteria are present.Tuberculosis is particularly difficult to diagnose in children.https://www.who.int/news-room/fact-sheets/detail/tuberculosis

In [None]:
df = pd.read_csv('../input/hackathon/task_2-Tuberculosis_infection_estimates_for_2018.csv', encoding='utf8')
df.head()

#Treatment

TB is a treatable and curable disease. Active, drug-susceptible TB disease is treated with a standard 6-month course of 4 antimicrobial drugs that are provided with information and support to the patient by a health worker or trained volunteer. Without such support, treatment adherence is more difficult.
Between 2000 and 2018, an estimated 58 million lives were saved through TB diagnosis and treatment.

#TB and HIV

People living with HIV are 19 (15-22) times more likely to develop active TB disease than people without HIV.

HIV and TB form a lethal combination, each speeding the other's progress. In 2018 about 251 000 people died of HIV-associated TB. In 2018, there were an estimated 862 000 new cases of TB amongst people who were HIV-positive, 72% of whom were living in Africa.
https://www.who.int/news-room/fact-sheets/detail/tuberculosis

In [None]:
report_df = ProfileReport(df)
report_df

#Multidrug-resistant TB (MDR-TB)

Anti-TB medicines have been used for decades and strains that are resistant to one or more of the medicines have been documented in every country surveyed. Drug resistance emerges when anti-TB medicines are used inappropriately, through incorrect prescription by health care providers, poor quality drugs, and patients stopping treatment prematurely.

WHO also approved in 2016 a rapid diagnostic test to quickly identify these patients. Sixty-two countries have started using shorter MDR-TB regimens. By the end of 2018, 90 countries reported having introduced bedaquiline and 57 countries reported having introduced delamanid, in an effort to improve the effectiveness of MDR-TB treatment regimens.

#Global commitments and the WHO response

On 26 September 2018, the United Nations (UN) held its first- ever high-level meeting on TB, elevating discussion about the status of the TB epidemic and how to end it to the level of heads of state and government. It followed the first global ministerial conference on TB hosted by WHO and the Russian government in November 2017. The outcome was a political declaration agreed by all UN Member States, in which existing commitments to the Sustainable Development Goals (SDGs) and WHO’s End TB Strategy were reaffirmed, and new ones added.https://www.who.int/news-room/fact-sheets/detail/tuberculosis

#Numerical Features

In [None]:
index_int_float = ['iso_numeric', 'year', 'e_hh_size', 'prevtx_data_available', 'newinc_con04_prevtx', 'ptsurvey_newinc', 'ptsurvey_newinc_con04_prevtx', 'e_prevtx_eligible', 'e_prevtx_eligible_lo', 'e_prevtx_eligible_hi', 'e_prevtx_kids_pct', 'e_prevtx_kids_pct_lo', 'e_prevtx_kids_pct_hi']      

plt.figure(figsize=[20,12])
i = 1
for col in index_int_float :
    plt.subplot(4,10,i)
    sns.violinplot(x=col, data= df, orient='v')
    sns.despine()
    i = i+1
plt.tight_layout()
plt.show()

#The WHO Contribution

Six core functions are being pursued by WHO to contribute to achieving the targets of the UN (United Nations) high-level meeting political declaration, SDGs (Sustainable Development Goals), End TB Strategy and WHO strategic priorities:

Providing global leadership to end TB through strategy development, political and multisectoral engagement, strengthening review and accountability, advocacy, and partnerships, including with civil society;

Shaping the TB research and innovation agenda and stimulating the generation, translation and dissemination of knowledge;

Setting norms and standards on TB prevention and care and promoting and facilitating their implementation;

Developing and promoting ethical and evidence-based policy options for TB prevention and care;

Ensuring the provision of specialized technical support to Member States and partners jointly with WHO regional and country offices, catalyzing change, and building sustainable capacity;

Monitoring and reporting on the status of the TB epidemic and progress in financing and implementation of the response at global, regional and country levels. https://www.who.int/news-room/fact-sheets/detail/tuberculosis

#Categorical Features

In [None]:
index_str = ['g_whoregion', 'country', 'iso2', 'iso3', 'source_hh']

plt.figure(figsize=[30,10])
i = 1
for col in index_str :
    plt.subplot(4,10,i)
    sns.scatterplot(x=col, y = 'e_hh_size' ,data= df)
    sns.despine()
    i = i+1
plt.tight_layout()
plt.show()

In [None]:
int_features = ['iso_numeric', 'year', 'prevtx_data_available', 'newinc_con04_prevtx', 'ptsurvey_newinc', 'ptsurvey_newinc_con04_prevtx', 'e_prevtx_eligible', 'e_prevtx_eligible_lo', 'e_prevtx_eligible_hi', 'e_prevtx_kids_pct', 'e_prevtx_kids_pct_lo', 'e_prevtx_kids_pct_hi']
        

float_features = [ ]

obj_features = ['g_whoregion', 'country', 'iso2', 'iso3', 'source_hh']

exp_reg = setup(df, #Train Data
                target = 'e_hh_size',  #Target
                categorical_features = obj_features, # Categorical Features
                numeric_features = int_features + float_features, # Numeric Features
                normalize = True, # Normalize Dataset
                remove_outliers = True, # Remove 5% Outliers
                remove_multicollinearity = True, # Remove Multicollinearity
                silent = True # Process Automation
               )

#Compare Models

In [None]:
compare_models(blacklist = ['tr', 'catboost'], sort = 'RMSLE')

#Create and Train Models

In [None]:
model_br = create_model('br')
model_lightgbm = create_model('lightgbm')
model_xgboost = create_model('xgboost')
model_ridge = create_model('ridge')

#Models Tuning

In [None]:
tuned_br = tune_model('br')
tuned_lightgbm = tune_model('lightgbm')
tuned_xgboost = tune_model('xgboost')
tuned_ridge = tune_model('ridge')

#Display Learning Curve

#Bayesian Ridge

In [None]:
plot_model(tuned_br, plot = 'learning')

Though I tried to drop `source_hh` since I got that error: Do not support non-ASCII characters in feature name. I still received the error message. So I couldn't perfom other curves, just more errors.  

In [None]:
#codes from Rodrigo Lima  @rodrigolima82
from IPython.display import Image
Image(url = 'https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRGF3rClhVjQr2cVSoBEbwOs4eaQLl3KD8CeQ&usqp=CAU',width=400,height=400)

viivhealthcare.com

![](https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRno46WTA4aUE0IM-DMaFaKc7YnjARqD33aUA&usqp=CAU)
thejakartapost.com

Das War's, Kaggle Notebook Runner: Marília Prata  @mpwolke