<a href="https://colab.research.google.com/github/guilhermebispo/nih-chestxray-label-validation/blob/main/app_web.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NIH Chest X-ray Labeler — Colab (Public URL only)

This notebook sets up a **minimal Flask web app** to label NIH Chest X-ray images with **Portuguese labels**, collect **expert name + CRM**, and save annotations incrementally to CSV. Access is provided via a **public Cloudflared URL** (no account required).

**Output file:** `/content/web_labeler/data/labels_experts.csv`


## 1) Install dependencies

In [1]:
!pip -q install flask==3.0.3 pandas==2.2.2
!wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!dpkg -i cloudflared-linux-amd64.deb >/dev/null 2>&1 || true
print('✅ Dependencies installed')

✅ Dependencies installed


## 2) Fetch sample CSV and images from your GitHub repo

In [2]:
!rm -rf nih-chestxray-label-validation
!git clone https://github.com/guilhermebispo/nih-chestxray-label-validation.git

import os, shutil
os.makedirs('/content/web_labeler/data', exist_ok=True)
os.makedirs('/content/web_labeler/static/images', exist_ok=True)

shutil.copy('/content/nih-chestxray-label-validation/sample_labels.csv', '/content/web_labeler/data/sample_labels.csv')

for f in os.listdir('/content/nih-chestxray-label-validation/images'):
    src = f'/content/nih-chestxray-label-validation/images/{f}'
    dst = f'/content/web_labeler/static/images/{f}'
    shutil.copy(src, dst)
print('✅ Copied CSV and images into /content/web_labeler/...')

Cloning into 'nih-chestxray-label-validation'...
remote: Enumerating objects: 20, done.[K
remote: Counting objects: 100% (20/20), done.[K
remote: Compressing objects: 100% (19/19), done.[K
remote: Total 20 (delta 1), reused 11 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (20/20), 3.12 MiB | 9.04 MiB/s, done.
Resolving deltas: 100% (1/1), done.
✅ Copied CSV and images into /content/web_labeler/...


## 3) Create Flask app (PT-BR labels, name & CRM, autosave)

In [None]:
# Write Flask app to disk (corrigido, com linhas quebradas para facilitar)
from pathlib import Path

app_path = Path('/content/web_labeler_app.py')

app_code = '''from flask import Flask, request, render_template_string, redirect, url_for
import os, csv
from pathlib import Path
from datetime import datetime
import pandas as pd

APP_DIR = Path('/content/web_labeler')
DATA_DIR = APP_DIR / 'data'
IMG_DIR  = APP_DIR / 'static' / 'images'
CSV_NIH  = DATA_DIR / 'sample_labels.csv'
CSV_OUT  = DATA_DIR / 'labels_experts.csv'

DATA_DIR.mkdir(parents=True, exist_ok=True)
IMG_DIR.mkdir(parents=True, exist_ok=True)

LABEL_MAP_PT2EN = {
    'Sem achado': 'No Finding',
    'Atelectasia': 'Atelectasis',
    'Cardiomegalia': 'Cardiomegaly',
    'Derrame pleural': 'Effusion',
    'Infiltração': 'Infiltration',
    'Massa': 'Mass',
    'Nódulo': 'Nodule',
    'Pneumonia': 'Pneumonia',
    'Pneumotórax': 'Pneumothorax',
    'Consolidação': 'Consolidation',
    'Edema': 'Edema',
    'Enfisema': 'Emphysema',
    'Fibrose': 'Fibrosis',
    'Espessamento pleural': 'Pleural_Thickening',
    'Hérnia': 'Hernia',
}
LABELS_PT = list(LABEL_MAP_PT2EN.keys())

app = Flask(
    __name__,
    static_folder=str(APP_DIR / 'static'),
    static_url_path='/static'
)

def load_available_images():
    if not CSV_NIH.exists():
        files = [p.name for p in IMG_DIR.glob('*') if p.suffix.lower() in {'.png','.jpg','.jpeg','.bmp','.tif','.tiff'}]
        return sorted(files)
    df = pd.read_csv(CSV_NIH)
    names = df['Image Index'].astype(str).tolist()
    return [n for n in names if (IMG_DIR / n).exists()]

def ensure_out_header():
    if not CSV_OUT.exists() or CSV_OUT.stat().st_size == 0:
        with open(CSV_OUT, 'w', newline='', encoding='utf-8') as f:
            w = csv.writer(f)
            w.writerow(['timestamp','especialista','crm','imagem','rotulos_pt','rotulos_nih','acao'])

@app.route('/')
def index():
    i = int(request.args.get('i', 0))
    files = load_available_images()
    total = len(files)
    if total == 0:
        return '<h3>Nenhuma imagem encontrada em static/images. Coloque as imagens lá (nomes iguais ao \"Image Index\").</h3>'
    if i >= total:
        return '<h3>✅ Fim da lista. <a href=\"/?i=0\">Reiniciar</a></h3>'
    img_name = files[i]

    html = """
<html lang='pt-br'>
<head>
  <meta charset='utf-8'>
  <title>Rotulador de Raios-X</title>
  <style>
    body { font-family: Arial; margin: 20px; background: #fafafa; }
    .card { max-width: 950px; margin: auto; padding: 20px; background: white; border-radius: 12px; box-shadow: 0 0 8px #ccc; }
    img { max-width: 100%; border-radius: 8px; }
    button { margin: 5px; padding: 10px 15px; border: none; border-radius: 8px; cursor: pointer; }
    .save { background: #16a34a; color: white; }
    .skip { background: #f59e0b; color: white; }
    .unknown { background: #6b7280; color: white; }
  </style>
</head>
<body>
  <div class='card'>
    <h2>Rotulagem de Raios-X ({{i+1}} / {{total}})</h2>
    <p><b>Imagem:</b> {{img_name}}</p>
    <img src='{{ url_for("static", filename="images/" + img_name) }}'><br><br>
    <form method='POST' action='{{ url_for("submit") }}'>
      <input type='hidden' name='i' value='{{i}}'>
      <input type='hidden' name='img' value='{{img_name}}'>
      <label>Especialista:</label><br>
      <input name='especialista' required style='width:100%;padding:5px;'><br><br>
      <label>CRM:</label><br>
      <input name='crm' required style='width:100%;padding:5px;'><br><br>
      <label>Patologias (em português):</label><br>
      {% for lbl in labels %}
        <label><input type='checkbox' name='labels' value='{{lbl}}'> {{lbl}}</label><br>
      {% endfor %}
      <p><i>'Sem achado' é exclusivo.</i></p>
      <button class='save' name='action' value='SAVE'>Salvar e Próxima</button>
      <button class='skip' name='action' value='SKIP'>Pular</button>
      <button class='unknown' name='action' value='UNKNOWN'>Indefinido</button>
    </form>
  </div>
  <script>
    const boxes = document.querySelectorAll('input[type=checkbox]');
    const noneBox = Array.from(boxes).find(b => b.value === 'Sem achado');
    const others = Array.from(boxes).filter(b => b.value !== 'Sem achado');
    if(noneBox){
      noneBox.addEventListener('change', ()=>{ if(noneBox.checked){ others.forEach(o=>o.checked=false); }});
      others.forEach(o=>o.addEventListener('change', ()=>{ if(o.checked && noneBox.checked){ noneBox.checked=false; }}));
    }
  </script>
</body></html>
"""
    return render_template_string(html, img_name=img_name, labels=LABELS_PT, i=i, total=total)

@app.post('/submit')
def submit():
    ensure_out_header()
    i = int(request.form.get('i', '0'))
    img = request.form.get('img')
    esp = request.form.get('especialista','').strip()
    crm = request.form.get('crm','').strip()
    action = request.form.get('action','SAVE').upper()
    labels_pt = request.form.getlist('labels')
    if 'Sem achado' in labels_pt:
        labels_pt = ['Sem achado']
    labels_nih = [LABEL_MAP_PT2EN[l] for l in labels_pt] if labels_pt else []
    with open(CSV_OUT, 'a', newline='', encoding='utf-8') as f:
        w = csv.writer(f)
        w.writerow([datetime.utcnow().isoformat(), esp, crm, img, '|'.join(labels_pt), '|'.join(labels_nih), action])
    return redirect(url_for('index', i=i+1))

# Run Flask in background thread (port 7860). Cloudflared will expose it publicly.
def run_flask():
    app.run(host='0.0.0.0', port=7860, debug=False, use_reloader=False)

if __name__ == '__main__':
    import threading, time, subprocess, re
    t = threading.Thread(target=run_flask, daemon=True)
    t.start()
    time.sleep(2)
    print('✅ Flask started on http://127.0.0.1:7860')
'''

# salva em disco
app_path.write_text(app_code, encoding='utf-8')
print('✅ Saved Flask app to', str(app_path))


✅ Saved Flask app to /content/web_labeler_app.py


## 4) Start public URL (Cloudflared)

In [4]:
import subprocess, re, time, threading, runpy

# Start Flask in background by executing the app file
def _run_app():
    runpy.run_path('/content/web_labeler_app.py', run_name='__main__')
threading.Thread(target=_run_app, daemon=True).start()
time.sleep(2)

print('🚇 Starting cloudflared tunnel...')
proc = subprocess.Popen(
    ['cloudflared', 'tunnel', '--url', 'http://localhost:7860', '--no-autoupdate'],
    stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
)
public_url = None
for _ in range(1200):
    line = proc.stdout.readline()
    if not line:
        time.sleep(0.1)
        continue
    m = re.search(r'(https://[a-z0-9-]+\.trycloudflare\.com)', line.strip())
    if m:
        public_url = m.group(1)
        print('🌐 Public URL:', public_url)
        break
if not public_url:
    print('⚠️ Could not capture public URL. Check logs above.')

 * Serving Flask app 'web_labeler_app'
 * Debug mode: off


 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:7860
 * Running on http://172.28.0.12:7860
INFO:werkzeug:[33mPress CTRL+C to quit[0m


🚇 Starting cloudflared tunnel...
✅ Flask started on http://127.0.0.1:7860
🌐 Public URL: https://stranger-judy-hope-pull.trycloudflare.com
