Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DV/HDV Metadata Reading, Chaptering & Splitting by Creation Date Code. #1664

Closed
4 of 7 tasks
harrypm opened this issue Jul 22, 2023 · 18 comments
Closed
4 of 7 tasks

DV/HDV Metadata Reading, Chaptering & Splitting by Creation Date Code. #1664

harrypm opened this issue Jul 22, 2023 · 18 comments

Comments

@harrypm
Copy link

harrypm commented Jul 22, 2023

I have a lot of issues to go through, so in order to make it easier for me to help you, I ask that you please try these things first

Description

Digital Video or more accuractly "DV" from MiniDV tapes can be captured as one constant stream to a avi for example instead of breaking by clips to dv or mov containers etc ready for muxing to mkv/flac for archive.

This practice of single file capture causes issues of you lose date codes of each segment media info only shows the first clips information there is no chaptering etc, old tools like DVdate exist however don't have universal system support and fails as it does not read the ancillarily stream data properly, however all the time data is in the stream.

Funny enough I use this tool for both splitting large OBS files and muxing audiobooks to chaptered single files and this feature has only just occured to me how useful it would be.

DVRescue have a simmler tools to split files however this is very unreliable.

Feature Request

  • Read DV File Metadata (MediaInfo)
  • Split by Creation Date Code (DVAnalyse/Mediainfo)
  • Import Chapters From DVAnalyse Export
@mifi
Copy link
Owner

mifi commented Jul 22, 2023

Thanks for your request. I believe DV isn't really a widely used code/format these days, so unless this can be done easily (TBH I don't even know how to implement any of these features), then I don't think it will ever be implemented in LosslessCut. If you have any sample code showing how to read these 3 kinds of data, that could be a start. I'm closing this for now, but I can reopen if more information becomes available.

@mifi mifi closed this as completed Jul 22, 2023
@harrypm
Copy link
Author

harrypm commented Jul 22, 2023

Its still widely used for consumers in developing nations well the HDV 25p/29.97p spec is still very alive alongside DVCPro etc, but its all the same metadata wise, not to mention its been used by consumers for both production and a backup format since the late 90s now people are migrating 20GB tapes to 25GB archival optical discs (M-Disc for example) its a very popular codec for family archives and this problem is also a very popular one.

Media Info can read the date code and time code metadata, the issue is just finding the sub-clip markers in the header information of each file begginng for splitting which might be possible to detect by sub-code timecode change.

DVPackager is what DVRescue has as its tool if this helps.

@mifi
Copy link
Owner

mifi commented Jul 22, 2023

Could you provide the exact fields from MediaInfo that need to be extracted, and how to turn them into segments (start/end times)? MediaInfo isn't yet integrated into LosslessCut, but I've added the feature request to the MediaInfo feature request, as it depends on that: #1649

@harrypm
Copy link
Author

harrypm commented Jul 22, 2023

Example Readout from a sub-clip.

General
Complete name               : L:\To-Re-Encode\31.12.2001 Cats Test Tape (TDK Tape) _scene_63.avi
Format                      : AVI
Format/Info                 : Audio Video Interleave
Commercial name             : DV
Format settings             : BitmapInfoHeader / WaveFormatEx
File size                   : 4.42 MiB
Duration                    : 1 s 200 ms
Overall bit rate mode       : Constant
Overall bit rate            : 30.9 Mb/s
Frame rate                  : 25.000 FPS
Recorded date               : 2002-03-20 21:16:12.000

Video
ID                          : 0
Format                      : DV
Codec ID                    : dvsd
Codec ID/Hint               : Sony
Duration                    : 1 s 200 ms
Bit rate mode               : Constant
Bit rate                    : 24.4 Mb/s
Width                       : 720 pixels
Height                      : 576 pixels
Display aspect ratio        : 4:3
Frame rate mode             : Constant
Frame rate                  : 25.000 FPS
Standard                    : PAL
Color space                 : YUV
Chroma subsampling          : 4:2:0
Bit depth                   : 8 bits
Scan type                   : Interlaced
Scan order                  : Bottom Field First
Compression mode            : Lossy
Bits/(Pixel*Frame)          : 2.357
Time code of first frame    : 00:09:28:03
Time code source            : Subcode time code
Stream size                 : 4.12 MiB (93%)
Encoding settings           : ae mode=full automatic / wb mode=automatic / white balance= / fcm=manual focus

Audio
ID                          : 1
Format                      : PCM
Format settings             : Little / Signed
Codec ID                    : 1
Duration                    : 1 s 200 ms
Bit rate mode               : Constant
Bit rate                    : 1 024 kb/s
Channel(s)                  : 2 channels
Sampling rate               : 32.0 kHz
Bit depth                   : 16 bits
Stream size                 : 150 KiB (3%)
Alignment                   : Aligned on interleaves
Interleave, duration        : 40  ms (1.00 video frame)
Interleave, preload duratio : 80  ms

Fields of intrest are for example:

Recorded date : 2002-03-20 21:16:12.000

&

Time code of first frame    : 00:09:28:03
Time code source            : Subcode time code

@harrypm
Copy link
Author

harrypm commented Jul 22, 2023

As for "how to turn them into segments (start/end times)"

Thats the big question, each scene/segment or more accuractly sub-clip only has record date metadata in the stream but it has to be read, if the applicaiton could scrub the entire file for such markers then apply them to frame accurate chapter markers it could work.

DV Analyser provides a readout like so

(I'm an idiot this exports a standard marker XML for Final Cut so that can be easily intergrated for chapter markers...)

DV Analyzer v.1.4.2 by AudioVisual Preservation Solutions, Inc. http://www.avpreserve.com

L:\To-Re-Encode\31.12.2001 Cats Test Tape (TDK Tape) .avi

Frame Count: 116883

Frame count with video error concealment: 2776 frames 
Total video error concealment:    319120 errors (  317500 "A" errors,     1620 "F" errors)
Frame count with CH1 audio error code: 783 frames 
Total audio error code for CH1:     16263 errors (    4005 Dseq=0,     1575 Dseq=1,     3780 Dseq=2,     1611 Dseq=3,     3672 Dseq=4,     1620 Dseq=5)
Frame count with DV timecode incoherency: 2 frames 
Frame count with Arbitrary bit inconsistency: 6 frames 

Absolute time	DV timecode range        	Recorded date/time range                         	Frame range
00:00:00.000	00:00:00:15 - 00:01:00:16	XXXX-XX-XX 00:00:00.000 - XXXX-XX-XX XX:XX:XX:XX	       0 -     1509
00:01:00.400	00:00:00:00 - 00:07:04:08	XXXX-XX-XX XX:XX:XX:XX - 2001-12-31 23:22:09    	    1510 -    12126
00:08:05.080	00:00:00:00 - 00:08:45:14	2001-12-31 23:28:13     - 2002-01-01 19:34:38    	   12127 -    25266
00:16:50.680	00:08:45:15 - 00:12:29:22	2002-01-01 13:31:24     - 2002-01-01 22:03:01    	   25267 -    30882
00:20:35.320	00:00:00:00 - 00:05:39:09	2002-01-02 14:27:10     - 2002-01-02 15:48:55    	   30883 -    39375
00:26:15.040	00:00:00:00 - 00:00:00:00	2002-01-02 22:30:22     - 2002-01-02 22:30:22    	   39376 -    39376
00:26:15.080	00:00:00:02 - 00:00:00:03	2002-01-02 22:30:22     - 2002-01-02 22:30:22    	   39377 -    39378
00:26:15.160	00:00:00:05 - 00:00:00:06	2002-01-02 22:30:22     - 2002-01-05 10:57:51    	   39379 -    39380
00:26:15.240	00:00:00:08 - 00:05:44:04	2002-01-05 10:57:51     - 2002-01-05 11:36:20    	   39381 -    47985
00:31:59.440	00:00:00:00 - 00:02:36:02	2002-01-05 13:18:43     - 2002-01-05 14:04:19    	   47986 -    51896
00:34:35.880	00:00:00:00 - 00:01:02:07	2002-01-05 16:39:22     - 2002-01-05 22:51:40    	   51897 -    53454
00:35:38.200	00:01:02:08 - 00:01:02:14	2002-01-05 16:40:24     - 2002-01-05 16:40:25    	   53455 -    53468
00:35:38.760	00:00:00:00 - 00:01:18:05	2002-01-05 22:53:17     - 2002-01-05 22:55:08    	   53469 -    55432
00:36:57.320	00:00:00:00 - 00:00:52:07	2002-01-16 21:17:04     - 2002-01-16 21:18:01    	   55433 -    56748
00:37:49.960	00:00:00:00 - 00:01:02:08	2002-01-20 20:06:37     - 2002-01-20 20:07:48    	   56749 -    58315
00:38:52.640	00:00:00:00 - 00:11:16:02	2002-01-30 18:34:52     - 2002-03-12 00:46:51    	   58316 -    75227
00:50:09.120	00:00:00:00 - 00:09:47:08	2002-03-14 20:27:57     - 2002-04-12 21:06:54    	   75228 -    89911
00:59:56.480	00:09:49:07 - 00:10:14:14	2002-04-12 21:06:56     - 2002-04-12 21:07:22    	   89912 -    90551
01:00:22.080	00:00:00:00 - 00:11:22:21	2002-04-12 21:11:47     - 2002-04-27 00:05:36    	   90552 -   107624
01:11:45.000	00:11:22:22 - 00:11:25:00	2002-04-25 22:59:57     - 2002-04-25 22:59:59    	  107625 -   107677
01:11:47.120	00:11:25:01 - 00:12:15:14	2002-04-25 23:00:00     - 2002-05-02 13:33:33    	  107678 -   108941
01:12:37.680	00:12:15:15 - 00:12:17:07	2002-04-25 23:00:50     - 2002-04-25 23:00:52    	  108942 -   108992
01:12:39.720	00:00:00:00 - 00:05:00:09	2002-05-02 13:40:27     - 2002-05-02 13:53:34    	  108993 -   116510
01:17:40.440	00:00:00:00 - 00:00:14:21	2002-05-02 13:54:14     - 2002-05-02 13:59:29    	  116511 -   116882

Percent of frames with Error: 2.69%
Percent of frames with Error (including Arbitrary bit inconsistency): 2.69%
Percent of frames with Video Error Concealment: 2.38%
Percent of frames with Audio Errors: 0.67%
Percent of frames with Timecode Incoherency: 0.00%
Percent of frames with Arbitrary bit inconsistency: 0.01%

Warning, frame count is maybe incoherant (reported by MediaInfo: 116882)

@harrypm
Copy link
Author

harrypm commented Jul 22, 2023

Here is some sample data of the export it provides.

By_Frame.txt

Summary.txt

FCP7_Interchange.xml.txt (rename to .xml)

@harrypm
Copy link
Author

harrypm commented Jul 24, 2023

@mifi I've amended my request as setting chapter markers for splitting is actually feasible, as absolute frame information is provided by DVAnalyse, I can provide sample files if need be for testing?

@harrypm harrypm changed the title DV/HDV Metadata & Splitting Support DV/HDV Metadata Reading, Chaptering & Splitting by Creation Date Code. Jul 24, 2023
@mifi
Copy link
Owner

mifi commented Jul 24, 2023

Cool. The only problem is that mediainfo.js doesn't seem to contain DVAnalyser. And I cannot find any native node.js bindings (gyp) wrapper around MediaLib/DVAnalyzer. Do you know if it's possible to get DV analysis output from MediaInfo alone? If not, then probably someone has to either:

  1. make a fork or PR to https://github.com/buzz/mediainfo.js to include DVAnalyser also in the EMSCRIPTEN build
  2. make a native node.js binding for MediaLib/DVAnalyzer, to allow using it from Node.js/Electron

FCP7_Interchange.xml.txt (rename to .xml)

not sure how this works, because out markers seem to always be -1. example:

<marker>
			<name>XXXX-XX-XX XX:XX:XX.XXX</name>
			<comment>REC_START</comment>
			<in>12</in>
			<out>-1</out>
			<color>
				<alpha>0</alpha>
				<red>0</red>
				<green>0</green>
				<blue>255</blue>
			</color>
		</marker>

not sure how this xml data would be transformed into segments. You can see here an example of the FCP XML files that losslesscut currently supports:
https://github.com/mifi/lossless-cut/blob/master/src/fixtures/Final%20Cut%20Pro%20XMEML.xml

If you can provide a DV file, yes that would be helpful

@harrypm
Copy link
Author

harrypm commented Jul 24, 2023

I have emailed you the file I was using to make the above DVAnalyse files, can make more media on request.

MediaInfo sadly only reads whatever is presented first in the stream header order so the first clip is all thats shown, there is no deeper probing modes I have found.

I think supporting the By_Frame.txt would be easyer to use for marker import no?

I'll make a issue ticket over on MediaInfo too and see if that goes anyware, as this is a pretty useful thing someone should notice.

@mifi
Copy link
Owner

mifi commented Jul 24, 2023

I'll make a issue ticket over on MediaInfo too and see if that goes anyware, as this is a pretty useful thing someone should notice.

I doubt they would want to integrate DVAnalyze into mediaInfo, seeing they already implemented as a separate tool, but it could be worth the try I guess..

I think supporting the By_Frame.txt would be easyer to use for marker import no?

losslesscut could implement a custom parser/import format, yes, that would be much easier. It's still not obvious to me how to convert By_Frame.txt into a list of segments (start_time, end_time, label) though. And what's the difference between By_frame.txt and Summary.txt ? and which one should we allow importing. Also I see that by_frame.txt has a LOT of data, and I think 3000+ segments could be too much for losslesscut to handle.

#1340

@harrypm
Copy link
Author

harrypm commented Jul 24, 2023

Ah I see now so By_Frame.txt also has the error log reporting on the frame/audio sample level dropouts mostly these are non-noticeable in real world unless it takes out a whole row of frames and you get a blocky looking segment etc.

Summary.txt is accurate to the start stop time information so that would be the priority for practical use though embedding the outher info inside the exported clips i.e an mkv would not be a bad idea but non-critical.

mifi added a commit that referenced this issue Jul 25, 2023
@mifi
Copy link
Owner

mifi commented Jul 25, 2023

I've implemented a DV Analyzer Summary.txt importer, will be included in the next version!

@harrypm
Copy link
Author

harrypm commented Jul 25, 2023

Whoooo! thats wonderful to hear!

Now can we have automatic naming of segments by recorded date field? (then this is perfect)

@mifi
Copy link
Owner

mifi commented Jul 25, 2023

I'm setting the segment name to the Recorded date/time range column from the txt file:

Screenshot 2023-07-25 at 18 59 49

@harrypm
Copy link
Author

harrypm commented Jul 25, 2023

Would it be possible to format it to only use start date so segment output is like 2022.08.29_09.21.41.mkv for example? having both options would be ideal but latter would be more practical for real world use.

@mifi
Copy link
Owner

mifi commented Jul 25, 2023

yes that makes sense. will also add each time value as tags, so they can be used in the output file name template like so:

${SEG_TAGS.recordedStart} ${SEG_TAGS.recordedEnd}

mifi added a commit that referenced this issue Jul 25, 2023
@harrypm
Copy link
Author

harrypm commented Jul 28, 2023

That is very helpful, sadly unlike digital kit today it does not embed the camcorder ID but we cant have everything 😂

I was wondering if I could get a test Windows exe or Linux deb to play with the new import feature? thanks!

@mifi
Copy link
Owner

mifi commented Aug 21, 2023

you can try github action artifact, shoul be fixed now, see #1673

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants