Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of bounds in Aggression Query #4

Closed
InkosiZhong opened this issue Nov 26, 2022 · 23 comments
Closed

Index out of bounds in Aggression Query #4

InkosiZhong opened this issue Nov 26, 2022 · 23 comments

Comments

@InkosiZhong
Copy link

I follow the Reproducing Experiments, and get a index is out of bounds error when executing NightStreetAggregateQuery and NightStreetAveragePositionAggregateQuery in night_street_offline.py.

Target DNN Invocations: 100%|██████████████████████| 7000/7000 [00:00<00:00, 2924606.83it/s]
Propagation:   0%|                                               | 0/973136 [00:00<?, ?it/s]/home/inkosizhong/Lab/VideoQuery/tasti/tasti/query.py:37: RuntimeWarning: invalid value encountered in true_divide
  weights = weights / weights.sum()
Propagation: 100%|███████████████████████████████| 973136/973136 [00:25<00:00, 38122.22it/s]
r 1
Traceback (most recent call last):
  File "tasti/examples/night_street_offline.py", line 276, in <module>
    query.execute_metrics(err_tol=0.01, confidence=0.05)
  File "/home/inkosizhong/Lab/VideoQuery/tasti/tasti/query.py", line 86, in execute_metrics
    res = self._execute(err_tol, confidence, y)
  File "/home/inkosizhong/Lab/VideoQuery/tasti/tasti/query.py", line 69, in _execute
    estimate, nb_samples = sampler.sample()
  File "/home/inkosizhong/Lab/VideoQuery/blazeit/blazeit/aggregation/samplers.py", line 58, in sample
    sample = self.get_sample(Y_pred, Y_true, t)
  File "/home/inkosizhong/Lab/VideoQuery/blazeit/blazeit/aggregation/samplers.py", line 105, in get_sample
    yt_samp = Y_true[nb_samples]
IndexError: index 973136 is out of bounds for axis 0 with size 973136

This error raise at Sampler.sample() in blazeit/aggregation/samplers.py, where the index variable t increases unlimited.
I guess that under normal situation EBS will make the sampling stop before reaching the upper bound (len(Y_true)), but I don't know why it has been sampling until the last frame during the reproduction process.

@ddkang
Copy link
Collaborator

ddkang commented Nov 26, 2022

Are you using the correct datasets and dates?

@ddkang
Copy link
Collaborator

ddkang commented Nov 26, 2022

This branch should reproduce SIGMOD, if you're using the correct data: https://github.com/stanford-futuredata/tasti/tree/sigmod

@InkosiZhong
Copy link
Author

I have switch to the signed branch, but it behaves the same.
Besides, I find that the hyper-parameters in night_street_offline.py and night_street_online.py are different. For example:

  1. nb_train=3000 and nb_buckets=7000 in NighStreetOfflineConfig, and nb_train=1000 and nb_buckets= 1000 in NighStreetOnlineConfig.
  2. NightStreetAggregateQuery: err_tol=0.01 and confidence=0.05 at offline while err_tol=0.1 and confidence=0.1 at online.
    I'm wondering is that correct?

@InkosiZhong
Copy link
Author

I am using the 2017-12-14.zip and 2017-12-17.zip downloaded from here, and the DNN outputs from here, just as your guidance.

@InkosiZhong
Copy link
Author

My complete process is as follows,

  1. build up environment
# here I use master branch as the sigmod branch has no tasti.yml
git clone https://github.com/stanford-futuredata/tasti.git 
cd tasti
conda env create -f tasti.yml
conda activate tasti3
cd ..

git clone https://github.com/stanford-futuredata/swag-python.git
cd swag-python/
conda install -c conda-forge opencv
pip install -e .
cd ..

git clone https://github.com/stanford-futuredata/blazeit.git
cd blazeit/
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
conda install -c conda-forge pyclipper
pip install -e .
cd ..

git clone https://github.com/stanford-futuredata/supg.git
cd supg/
pip install pandas feather-format
pip install -e .
cd ..

git clone -b sigmod https://github.com/stanford-futuredata/tasti.git tasti_sigmod
cd tasti_sigmod/
pip install -r requirements.txt
pip install -e .
mkdir cache # will be written in night_street_offline.py
  1. download dataset(2017-12-14.zip and 2017-12-17.zip) and dnn outputs (jackson-town-square-2017-12-14.json and jackson-town-square-2017-12-17.json)
  2. create a folder named datasets, unzip and move the 2017-12-14, 2017-12-17 and 2 json files into it
  3. modify the ROOT_DATA_DIR in tasti/examples/night_street_offline.py to the .../datasets
  4. run python tasti/examples/night_street_offline.py
  5. tasti performs do_mining/do_training/do_infer/do_bucketting and fails at query.execute_metrics(err_tol=0.01, confidence=0.05) (NightStreetAggregateQuery)

@ddkang
Copy link
Collaborator

ddkang commented Nov 28, 2022

Try an error tolerance of 0.05

@InkosiZhong
Copy link
Author

Unfortunately, It doesn't work. I even tried an error tolerance of 0.9 but still failed.
I print the prediction value y_pred[i] and true value float(y_true[i]) in the propagation() after line 39 of query.py, and I find them very different (such as y_pred[i]=3 while y_true[i]=0).
Besides, top_distances even contains negative values.
Is that means the something wrong during the index generation. Should I try other configurations in the night_street_online.py, such as nb_buckets=1000?

@ddkang
Copy link
Collaborator

ddkang commented Nov 29, 2022

Yes I'm pretty sure something is wrong.

What are the hashes of the CSV files? This is what I see

e878ca724fd42d490dcc5d4ad8aa16cc  jackson-town-square-2017-12-14.csv
a72522b880023dfafea34e81448692b2  jackson-town-square-2017-12-17.csv

@InkosiZhong
Copy link
Author

Oh, the MD5 values are different.

MD5 (jackson-town-square-2017-12-14.csv) = 8abae92a0ac3b9f6513ca23ab2549430
MD5 (jackson-town-square-2017-12-17.csv) = b51637ba2b45b9eea37f9cfc81b562d8

But I re-download from the google driver and the MD5 values are still different with yours but same with mine.

@ddkang
Copy link
Collaborator

ddkang commented Nov 29, 2022

Try downloading them again, I may have uploaded the wrong version

@InkosiZhong
Copy link
Author

Thank you very much! Now the MD5 values are correct. I will try again.

@InkosiZhong
Copy link
Author

Unfortunately, the error still exists. Here are the MD5 values of the datasets downloaded from the google drive
Can you please verify if it is correct?
Besides, I find that there is a branch called tasti-compatability for the blazeit project. Should I switch to that branch?

MD5 (2017-12-14-001.zip) = 11e1f424127a2463d0908fedd86719fd
MD5 (2017-12-17-002.zip) = bea086f82bdcc5f40a91d0ec2fbde4dd

@ddkang
Copy link
Collaborator

ddkang commented Nov 29, 2022

Try the branch

@InkosiZhong
Copy link
Author

InkosiZhong commented Nov 30, 2022

I have tried all branches of the blazeit including bugfix and tasti-compatibility. But they don't work too.
Unfortunately, I find it might be a bug of blazeit. Here are the similar issues,
Error on reproduce the aggregation experiments (step 3 of Reproducing experiments section)
README is outdated. Please update.
However, blazeit seems to have stopped maintenance and none of these issues have been resolved. Could you please share a working version of blazeit that you used in your experiment? Thank you so much.

@ddkang
Copy link
Collaborator

ddkang commented Nov 30, 2022

Are you sure you used the correct package versions of all packages in the SIGMOD branch?

@InkosiZhong
Copy link
Author

I'm using the tasti.yml in the master branch to create the conda environment and the requirements.txt in the SIGMOD branch (same as the master branch).
The only thing I have modified in the requirements.txt is

numba==0.50.1 -> numba==0.51

This is because of an error below,

(tasti3)$ pip install -r requirements.txt
...
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behavior is the source of the following dependency conflicts.
datashader 0.13.0 requires numba>=0.51, but you have numba 0.50.1 which is incompatible.

while datashader=0.13.0 is specified by the tasti.yml
Besides, I have skipped PyTorch installation step in the README.md,

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

because these packages are already specified in the tasti.yml,

pytorch=1.12.1=py3.8_cuda10.2_cudnn7.6.5_0
torchaudio=0.12.1=py38_cu102
torchvision=0.13.1=py38_cu102

@ddkang
Copy link
Collaborator

ddkang commented Nov 30, 2022

Commit 90461d5ad6af279d12f6589a90c1a987f261e352 on tasti with commit 8c147f3a0b42e3dcc0b7718b2c078b3dafaba7bf on blazeit works for me when run from scratch

@InkosiZhong
Copy link
Author

I have no idea which step goes wrong.
May you please share the following files to help me check if the problem happens at tasti part or the blazeit part,

cache
  |- embeddings.npy
  |- model.pt
  |- reps.npy
  |- topk_dists.npy
  |- topk_reps.npy

Thank you very much.

@InkosiZhong
Copy link
Author

I have tried the following configurations,

  1. skip the conda environment creation (from tasti.yml). And I successfully installed numba==0.50.1
  2. run python tasti/examples/night_street_online.py

Unfortunately, the same error still exists.

@InkosiZhong
Copy link
Author

I rechecked the correspondence between the data and the labels and I see a delay between them.
The code is established based on the VideoDataset and the way you read .csv. You can see the visualization at here.

# read .csv file
len_14 = 973489
len_17 = 973136
df = pd.read_csv('../datasets/jackson-town-square/jackson-town-square-2017-12-14.csv')
df = df[df['object_name'].isin(['car'])]
frame_to_rows = defaultdict(list)
for row in df.itertuples():
    frame_to_rows[row.frame].append(row)
labels = []
for frame_idx in range(len_14):
    labels.append(frame_to_rows[frame_idx])

# prepare video dataset
video = VideoDataset(
    video_fp='../datasets/jackson-town-square/2017-12-14'
)

# visualization
cnt = 0
for i, frame in enumerate(video):
    if i % 8 != 0:
        continue
    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    if cnt > 60:
        break
    if labels[i] != []:
        for label in labels[i]:
            print(label)
            print((int(label[4]), int(label[5])), (int(label[6]), int(label[7])))
            frame = cv2.rectangle(frame, (int(label[4]), int(label[5])), (int(label[6]), int(label[7])), (0,255,0), 2)
        cnt += 1
    if len(labels[i]) > 1:
        pass
    frame = cv2.resize(frame, None, fx=0.25, fy=0.25)
    cv2.imwrite(f'annotation/{i}.png', frame)

@ddkang
Copy link
Collaborator

ddkang commented Dec 5, 2022

Oops, sorry about the video issue. Thank you for investigating

@Christosc96
Copy link

Does that mean that the given version of night_street dataset is flawed? If so, is there a fix or a correct version of the data?

@InkosiZhong
Copy link
Author

For me, I corrected the data by subtracting 300 (estimated value) from all the frame numbers in the json files. However, the offset of dataset is one of the reasons. I later found out that the real cause of this problem was the numba version. Using numpy functions with njit decorator and prange in numba==0.50.1 will lead to abnormal results (usually all 0). This bug is fixed in the later version. Now I use the latest numba and TASTI works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants