The training dataset prepareation #8

qianduoduolr · 2023-07-20T12:17:24Z

Hi, I am following your work to prepare the training data for MOVI-f.
Could you please give more details about the instructions for annotation generation?

I generally follow the function create_kubric_eval_train_dataset in this link, and set the train_size=(512,512) and tracks_to_sample=2000 in create_point_tracking_dataset.
Besides, I also modify the 'movi_e/256x256' in this link in order to generate MOVI-f.

Is that right?

The text was updated successfully, but these errors were encountered:

nikitakaraevv · 2023-07-20T15:38:59Z

Hi @qianduoduolr, these are the settings that we used to prepare 11000 training sequences for Kubric MOVI-f:

create_point_tracking_dataset(
  train_size=(512, 512),
  shuffle_buffer_size=None,
  split="train",
  batch_dims=tuple(),
  repeat=True,
  vflip=False,
  random_crop=True,
  tracks_to_sample=2048,
  sampling_stride=4,
  max_seg_id=25,
  max_sampled_frac=0.1,
  num_parallel_point_extraction_calls=16
)

Most of these sequences are repeated with different crops because we used repeat=True and random_crop=True.

qianduoduolr · 2023-07-20T15:42:15Z

Thanks for your reply.
Do you modify the 'movi_e/256x256' in this link to movi_f ?

nikitakaraevv · 2023-07-20T15:46:24Z

As far as I remember, we had to download the dataset first for it to work. Then, we load MOVI-f (not MOVI-e) from a local path.

qianduoduolr · 2023-07-20T16:09:20Z

Ok, thanks, I will try later.

Anderstask1 · 2023-10-06T14:03:45Z

Hi @nikitakaraevv, can you provide the link to download the MOVI-f data you used? Did you simply run create_point_tracking_dataset() with the mentioned parameters to generate annotations on the dataset? How was the file structure used for training?

FYI, it seems like the description in the main README for generation of annotation is not complete. It is at least not so easy to follow, although I might miss something here...

Thanks for your help!

NEbrahimi · 2023-11-08T02:16:56Z

Hello @nikitakaraevv and @ernestchu,
I'm encountering the same problem that @Anderstask1 described in the previous post. Could you please take a look and let us know if there's any update or potential solution on the horizon? Your assistance would be greatly appreciated.

nikitakaraevv · 2023-11-13T15:03:37Z

Hi @Anderstask1 and @NEbrahimi,
I apologize for missing your messages as the issue was closed. I modified this function:

https://github.com/google-research/kubric/blob/e140e24e078d5e641c4ac10bf25743059bd059ce/challenges/point_tracking/dataset.py#L992

Here are the changes I made:

import torch
from PIL import Image

dataset_dir = "./kubric_movi_f"
os.makedirs(dataset_dir, exist_ok=True)

ds = tfds.as_numpy(create_point_tracking_dataset(
  train_size=(512, 512),
  shuffle_buffer_size=None,
  split="train",
  batch_dims=tuple(),
  repeat=True,
  vflip=False,
  random_crop=True,
  tracks_to_sample=2048,
  sampling_stride=4,
  max_seg_id=25,
  max_sampled_frac=0.1,
  num_parallel_point_extraction_calls=16
))

for i, data in enumerate(ds):
    print(i)
    seq_num = "0" * (4 - len(str(i))) + str(i)
    os.makedirs(os.path.join(dataset_dir, seq_num), exist_ok=True)
    os.makedirs(os.path.join(dataset_dir, seq_num, "frames"), exist_ok=True)
    for i_frame, frame in enumerate(data["video"]):
        Image.fromarray((((frame + 1) / 2.0) * 255.0).astype("uint8")).save(
            os.path.join(dataset_dir, seq_num, "frames", f"{i_frame:03d}.png")
        )
    traj_annots = {"coords": data["target_points"], "visibility": data["occluded"]}
    np.save(os.path.join(dataset_dir, seq_num, seq_num + ".npy"), traj_annots)
   
    # visualize res
    disp = plot_tracks(
        data["video"] * 0.5 + 0.5, data["target_points"], data["occluded"]
    )
    media.write_video(os.path.join(dataset_dir, f"{seq_num}.mp4"), disp, fps=10)

Then I just ran
python dataset.py to create point tracking annotations.

wwsource · 2023-11-14T07:56:54Z

Hi @nikitakaraevv,
I'm encountering the same problem that @Anderstask1 described in the previous post. When I ran python dataset.py, I got the message as follows and there was no data downloaded,

How to solve it or can you provide the link to download the training data you used?
Thanks for your help!

nikitakaraevv · 2023-11-16T13:01:21Z

Hi @wwsource, I also had this problem. I had to install gsutil and download Kubric locally first with gsutil -m cp -r gs://kubric-public/tfds/movi_f ./kubric_movi_f.
Then I was able to load it with:

ds = tfds.load(
        "512x512",
        data_dir=f"./kubric_movi_f",
        shuffle_files=shuffle_buffer_size is not None,
        download=False,
        **kwargs,
    )

rogerioagjr · 2023-11-18T01:07:35Z

Hi @nikitakaraevv,

I simply downloaded the dataset.py script and updated the main function with the modifications you suggested.

When I tried running dataset.py, I got the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: 
{{function_node__wrapped__IteratorGetNext_output_types_4_device_/
job:localhost/replica:0/task:0/device:CPU:0}} 
Incompatible shapes at component 0: expected [2048,24] but got [924,24]. 
[Op:IteratorGetNext] name:

Then I changed tracks_to_sample=2048 to tracks_to_sample=924 when calling create_point_tracking_dataset. With that, it ran for 11 iterations of the loop, but eventually failed with the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError:
{{function_node__wrapped__IteratorGetNext_output_types_4_device_/
job:localhost/replica:0/task:0/device:CPU:0}} 
Incompatible shapes at component 0: expected [924,24] but got [810,24]. 
[Op:IteratorGetNext] name:

Do you have any suggestions on what I should do or what I might have done wrong?

Also, the file structure that was being generated seemed wrong for the train.py script, since no kubric_movi_f/movi_f/frames directory was being created, but a bunch of kubric_movi_f/0000/frames, kubric_movi_f/0001/frames, kubric_movi_f/0002/frames, etc, one per iteration of the loop. This behavior seems expected from dataset.py since you do:

os.makedirs(os.path.join(dataset_dir, seq_num, "frames"), exist_ok=True)
        for i_frame, frame in enumerate(data["video"]):
            Image.fromarray((((frame + 1) / 2.0) * 255.0).astype("uint8")).save(
                os.path.join(dataset_dir, seq_num, "frames", f"{i_frame:03d}.png")
            )

but I just wanted to check that this is the correct file structure, seemed off.

Thank you!

nikitakaraevv · 2024-01-22T20:16:46Z

Hi @rogerioagjr, sorry, somehow I missed your message. Honestly, I don't know why the Kubric data is failing now. You might want to open an issue here.

As for your second question, seq_num corresponds to sequence numbers: 0001, 0002, 0003,...
So, it seems to be correct, right?

XiaoyuShi97 · 2024-02-27T05:32:12Z

Hi @nikitakaraevv . May I ask how large is the training data and the annotations? And if it is possible for you to upload the generated data to ease the data preparation process? Thanks!

nikitakaraevv · 2024-02-28T16:23:58Z

Hi @XiaoyuShi97, it would be better to ask the creators of Kubric to do it, as we simply generated a dataset using their provided script.

sinkers-lan · 2024-03-01T06:11:22Z

Hi @XiaoyuShi97 , In my impression, it would require 500+G. According to the explanation upstairs, generating the annotations is not so difficult in my point of view.

XiaoyuShi97 · 2024-03-17T14:55:46Z

Hi @XiaoyuShi97 , In my impression, it would require 500+G. According to the explanation upstairs, generating the annotations is not so difficult in my point of view.

Hi @nikitakaraevv , I generate the training data following the instructions, but the total size is only ~65G, which is much smaller than @sinkers-lan 's experience. Could you please provide the size of training data? Thx!

nikitakaraevv · 2024-03-17T21:05:20Z

Hi @XiaoyuShi97, it seems to be correct. 65Gb must be the size of generated annotations, while the annotations + the tensorflow record consume 500+ Gb.

XiaoyuShi97 · 2024-03-18T06:19:28Z

Thanks for your kind reply!

pubyLu · 2024-05-26T10:52:11Z

I download part of kubric_movi_f/512x512 datasets, then run create_point_tracking_dataset(),but got the following issue:
File "D:\A_research\tapnet-pytorch\tapnet-main\generateKubric\kubric_dataset.py", line 1036, in
main()
File "D:\A_research\tapnet-pytorch\tapnet-main\generateKubric\kubric_dataset.py", line 1016, in main
for i, data in enumerate(ds):
File "D:\ProgramData\Anaconda3\envs\tapnet\lib\site-packages\tensorflow_datasets\core\dataset_utils.py", line 76, in _eager_dataset_iterator
for elem in ds:
File "D:\ProgramData\Anaconda3\envs\tapnet\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 809, in next
return self._next_internal()
File "D:\ProgramData\Anaconda3\envs\tapnet\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 772, in _next_internal
ret = gen_dataset_ops.iterator_get_next(
File "D:\ProgramData\Anaconda3\envs\tapnet\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 3050, in iterator_get_next
_result = pywrap_tfe.TFE_Py_FastPathExecute(
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 274: invalid continuation byte
2\1.0.0\movi_f-train.tfrecord-00021-of-01024 GetLastError: 2

why did this?

shivanimall · 2024-06-11T15:39:50Z

Thanks for the great work, and thanks @nikitakaraevv for sharing these instructions here. Perhaps, useful to cross-reference on kubric as well.

qsisi · 2024-06-27T03:30:17Z

Hi @nikitakaraevv , I downloaded the tensorflow-record to local disk(only 512x512 of movi_f, which takes about ~552GB), and arrange them like this:

after I run python dataset.py, I get the following error:

my dataset.py is:

I must be messing around with the directory or dataset name, etc, could you give some hints about it?

qsisi · 2024-07-01T09:02:39Z

Hi @nikitakaraevv , I downloaded the tensorflow-record to local disk(only 512x512 of movi_f, which takes about ~552GB), and arrange them like this: after I run python dataset.py, I get the following error: my dataset.py is: I must be messing around with the directory or dataset name, etc, could you give some hints about it?

It has been solved.

qianduoduolr closed this as completed Jul 20, 2023

qianduoduolr reopened this Jul 20, 2023

qianduoduolr closed this as completed Jul 20, 2023

nikitakaraevv reopened this Nov 13, 2023

shivanimall mentioned this issue Jun 11, 2024

Point Tracking Dataset/Code Release google-research/kubric#253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training dataset prepareation #8

The training dataset prepareation #8

qianduoduolr commented Jul 20, 2023 •

edited

Loading

nikitakaraevv commented Jul 20, 2023

qianduoduolr commented Jul 20, 2023

nikitakaraevv commented Jul 20, 2023

qianduoduolr commented Jul 20, 2023

Anderstask1 commented Oct 6, 2023

NEbrahimi commented Nov 8, 2023

nikitakaraevv commented Nov 13, 2023

wwsource commented Nov 14, 2023

nikitakaraevv commented Nov 16, 2023

rogerioagjr commented Nov 18, 2023 •

edited

Loading

nikitakaraevv commented Jan 22, 2024

XiaoyuShi97 commented Feb 27, 2024

nikitakaraevv commented Feb 28, 2024

sinkers-lan commented Mar 1, 2024

XiaoyuShi97 commented Mar 17, 2024

nikitakaraevv commented Mar 17, 2024

XiaoyuShi97 commented Mar 18, 2024

pubyLu commented May 26, 2024

shivanimall commented Jun 11, 2024 •

edited

Loading

qsisi commented Jun 27, 2024

qsisi commented Jul 1, 2024

The training dataset prepareation #8

The training dataset prepareation #8

Comments

qianduoduolr commented Jul 20, 2023 • edited Loading

nikitakaraevv commented Jul 20, 2023

qianduoduolr commented Jul 20, 2023

nikitakaraevv commented Jul 20, 2023

qianduoduolr commented Jul 20, 2023

Anderstask1 commented Oct 6, 2023

NEbrahimi commented Nov 8, 2023

nikitakaraevv commented Nov 13, 2023

wwsource commented Nov 14, 2023

nikitakaraevv commented Nov 16, 2023

rogerioagjr commented Nov 18, 2023 • edited Loading

nikitakaraevv commented Jan 22, 2024

XiaoyuShi97 commented Feb 27, 2024

nikitakaraevv commented Feb 28, 2024

sinkers-lan commented Mar 1, 2024

XiaoyuShi97 commented Mar 17, 2024

nikitakaraevv commented Mar 17, 2024

XiaoyuShi97 commented Mar 18, 2024

pubyLu commented May 26, 2024

shivanimall commented Jun 11, 2024 • edited Loading

qsisi commented Jun 27, 2024

qsisi commented Jul 1, 2024

qianduoduolr commented Jul 20, 2023 •

edited

Loading

rogerioagjr commented Nov 18, 2023 •

edited

Loading

shivanimall commented Jun 11, 2024 •

edited

Loading