# **Workshop: Capturing Human Perception of Neighborhoods Using Online Data with Language and Vision Models**

# **Train a new coder, Stella using StreetLens**

In this session, we will explore how StreetLens helps automate neighborhood environment assessments.

StreetLens is a researcher-focused workflow that combines your domain expertise with vision-language models and street view imagery to generate meaningful, scalable annotations—no heavy coding required.

Curious how to scale expert-level evaluation across neighborhoods? Let's get started.

**Note:** The datasets provided are for demo purposes only and should not be shared outside this environment.

## Before we start:
Make sure to save a copy of this Jupyter notebook to your own Google Drive,
so that you can make edits and run the code yourself.

Go to a tab **File → Save a copy in Drive**, then run the code in your copy.

<img src = "https://drive.google.com/uc?id=1RtVrYCxlT-nsNNrBWvuzfwfHgZkaXnzV" height = 400 width = 650>

1. Connect to the shared Google Drive where the data and StreetLens files are stored

In [1]:
from google.colab import drive
drive.mount('/content/Shareddrives')

Mounted at /content/Shareddrives


In [2]:
%cd /content/Shareddrives/Shareddrives/UCGIS2025-Workshop-CSENG-Computer\ Science

/content/Shareddrives/Shareddrives/UCGIS2025-Workshop-CSENG-Computer Science


2. Install the required packages for StreetLens

In [3]:
!pip install -r requirements.txt



3. Load the case study data previously prepared for StreetLens.

In [4]:
from streetlens.data_processor import *
from streetlens.automated_prompt_tuner import *
from streetlens.vision_language_model_processor import *

codebook_path = './dataset/annotation/sso_codebook.json'
paper_path = './dataset/paper/abstract.json'
annotation_path = './dataset/annotation/sso_annotation.csv'
street_block_id = ['281']
# street_block_id = ['62146', '281', '282', '9576']
agent_annotation_path = './dataset/annotation/agent_annotation.csv'

data_config = DataProcessor.DataConfig(codebook_path=codebook_path, annotation_path=annotation_path, paper_path=paper_path, street_block_id=street_block_id)
data_processor = DataProcessor(data_config=data_config)

data_config.image_dir = './dataset/img/'
data_config.agent_annotation_path = agent_annotation_path


4. Let’s configure the lightweight vision-language model that StreetLens will use

In [5]:
device = "cuda" if torch.cuda.is_available() else "cpu"
data_config.model_name = 'OpenGVLab/InternVL3-2B-hf'
vlm_processor = VLMProcessor(data_processor.data_config, device=device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.96k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/4.18G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/126 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/481 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/666 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/6.86k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/811 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/877 [00:00<?, ?B/s]

You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.


5. Let’s enable AutomatedPromptTuner to assign a role to Stella and refine the codebook questions.

In [6]:
automated_prompt_tuner = AutomatedPromptTuner(data_config=data_config, vlm_processor=vlm_processor)

In [7]:
# role prompt
role_prompt = automated_prompt_tuner.construct_role_prompt()

Stella: Looks like we have 2 abstract(s). Let me step into a new role. I’m ready for it!
Stella: I am an expert in mixed methods research and qualitative analysis, with a specific focus on comparing researchers' and adolescents' observations of neighborhood environments. My expertise includes exploring the shared and unique aspects of these observations and examining ethnic-racial label usage in the context of segregated neighborhoods. I have conducted studies that highlight these aspects and have contributed to the understanding of how different perspectives on neighborhood environments


In [8]:
# codebook prompt
codebook_prompt = automated_prompt_tuner.construct_codebook_prompt()

Stella: I'm reading codebook...
Stella: I'm looking over the question-answer pairs for measure Disorder 3. I'm refining the codebook prompt...
Stella: I'm looking over the question-answer pairs for measure Decay 1. I'm refining the codebook prompt...
Stella: I'm looking over the question-answer pairs for measure Decay 2. I'm refining the codebook prompt...
Stella: I'm looking over the question-answer pairs for measure SS4. I'm refining the codebook prompt...
Stella: I'm looking over the question-answer pairs for measure SS5. I'm refining the codebook prompt...


6. For each measure, Stella will review and annotate each image. Then StreetLens will generate an aggregated annotation file at the same level as the sample annotations made by human coders (e.g., 1 annotation per street block).

In [9]:
data_config.role_prompt = automated_prompt_tuner.role_prompt
data_config.codebook_prompt_dict = automated_prompt_tuner.codebook_prompt_dict
vlm_processor.generate_annotation()

Stella: Jumping into measure Disorder 3...

	Stella: I got the image from ./dataset/img/281/W/6.jpg. I'm starting annotation now...
	Stella: My annotation is 0 
	Stella: The image does not show any visible graffiti on buildings, signs, walls, fences, sidewalks, poles, or streets. The scene appears to be a quiet residential area with no signs of graffiti.

	Stella: I got the image from ./dataset/img/281/W/8.jpg. I'm starting annotation now...
	Stella: My annotation is 0 
	Stella: The image does not show any visible graffiti on buildings, signs, walls, fences, sidewalks, poles, or streets. The scene appears to be a quiet street with a car and some trees, but no graffiti is present. The focus is on the car and the surrounding environment, which does not include any markings or tags.

	Stella: I got the image from ./dataset/img/281/W/10.jpg. I'm starting annotation now...
	Stella: My annotation is 0 
	Stella: The image does not show any visible graffiti on buildings, signs, walls, fences, 

{'281/W': {'Disorder 3': [0,
   [{'6.jpg': 'The image does not show any visible graffiti on buildings, signs, walls, fences, sidewalks, poles, or streets. The scene appears to be a quiet residential area with no signs of graffiti.'},
    {'8.jpg': 'The image does not show any visible graffiti on buildings, signs, walls, fences, sidewalks, poles, or streets. The scene appears to be a quiet street with a car and some trees, but no graffiti is present. The focus is on the car and the surrounding environment, which does not include any markings or tags.'},
    {'10.jpg': 'The image does not show any visible graffiti on buildings, signs, walls, fences, sidewalks, poles, or streets. The scene appears to be a quiet street with no signs of graffiti.'},
    {'1.jpg': 'The image does not show any visible graffiti on buildings, signs, walls, fences, sidewalks, poles, or streets. The scene appears to be a quiet, possibly residential area with no signs of graffiti.'},
    {'3.jpg': 'The image does 