date_gap_finder
searches for missing dates with in CSV files and optionally inserts new CSV entries for those missing dates.
Binaries for Windows, Mac, and Linux are available on the releases page.
I have many automated tasks that will append data to individual CSV files once per day. However, if for some reason that
task does not start, there is a possibility that I will not be notified of any errors. Each of these tasks are monitored via a
Grafana Dashboard
using the grafana-csv-datasource data source
plugin. This allows me to quickly view the status of all automated tasks. When a task has missed a day, date_gap_finder
will
insert CSV data in such a way that I will be notified that a task was not run.
This image displays the before and after of using date_gap_finder
. The Photo Import job runs every weekday. Notice that Monday, April 26
is missing from the Before image on the left. This could be easily missed since it occurs on a Monday (and the job isn't run on the weekends). The After image on the right shows what can be displayed when using date_gap_finder
to insert missing CSV data. It is now much easier to detect a date gap.
date_gap_finder searches for missing dates with in CSV files and optionally inserts CSV entries for those missing dates.
Usage:
date_gap_finder [command]
Available Commands:
help Help about any command
insert insert missing CSV entries
search search CSV files for missing dates
stats Show CSV file statistics
Flags:
-a, --amount int a maximum, numeric duration (default -1)
-B, --begin string only consider dates on or after this date
-c, --column int CSV column number (starts at zero)
-D, --debug int enable verbose debugging, set to 999 or 9999
-d, --delimiter string CSV delimiter (default ",")
-E, --end string only consider dates on or before this date
-f, --format string use this date/time format; See https://github.com/nleeper/goment#format
-h, --help help for date_gap_finder
-n, --noheader set if CSV file does not have header line
-p, --padding string add time to range before considering a gap between two dates (default "1s")
-S, --skipDays string skip comma-delimited set of fully spelled out days
-s, --skipWeekends allow gaps on weekends when set
-t, --tab use tab character as CSV delimiter
-u, --unit string unit of time, such as: days, hours, minutes
-v, --version version for date_gap_finder
This example file is called e.csv
, which should get updated once per day. On the 15th
, it got updated twice. There are no entries for the 17th
and 18th
.
Date | Errors | Warnings |
---|---|---|
2021-04-15 06:55:01 | 0 | 23 |
2021-04-15 08:30:26 | 1 | 22 |
2021-04-16 06:55:01 | 0 | 23 |
2021-04-19 06:55:01 | 2 | 21 |
__
- Search for gaps occurring more than
1 day
apart
$ date_gap_finder search -a 1 -u days e.csv
2021-04-17 06:55:01
2021-04-18 06:55:01
- Use -s to skip weekends
PS C:\> .\date_gap_finder.exe search -a 1 -u days -s .\e.csv
(no results, because the 17th and 18th occur on a Saturday and Sunday)
- Use -S to create a comma-separated list of days to skip
$ date_gap_finder search -a 1 -u days -S Sunday,Saturday e.csv
(no results, because the 17th and 18th occur on a Saturday and Sunday)
$ date_gap_finder search -a 1 -u days -S Sunday e.csv
2021-04-17 06:55:01
- Adds column data for missing dates
- CSV columns start 0
-
-r 1,1
- in column1
, insert a value of-1
-
-r 2,999
- in column2
, insert a value of999
-
- Note that multiple
-r
switches can be used
- Note that multiple
PS C:\> .\date_gap_finder.exe insert -a 1 -u days -r 1,-1 -r 2,999 .\e.csv
Date,Errors,Warnings
2021-04-15 06:55:01,0,23
2021-04-15 08:30:26,0,23
2021-04-16 06:55:01,0,23
2021-04-17 06:55:01,-1,999
2021-04-18 06:55:01,-1,999
2021-04-19 06:55:01,0,23
- Similar to above, but use one
-R
switch instead of two-r
switches -
-R 567
- excluding thenon-date
column, insert values of567
-
- Note that
-R
and-r
can be simultaneously used, with-r
having precedence over-R
- Note that
$ date_gap_finder insert -a 1 -u days -R 567 e.csv
Date,Errors,Warnings
2021-04-15 06:55:01,0,23
2021-04-15 08:30:26,0,23
2021-04-16 06:55:01,0,23
2021-04-17 06:55:01,567,567
2021-04-18 06:55:01,567,567
2021-04-19 06:55:01,0,23
- Use
-O
to overwrite the original file -
- a date-versioned backup file will be created containing the original file
- Use
-R
to add0
to all missing columns of all missing rows - Note that a backup file similar to this has been created:
e--20210506.162151.bak
-
- The naming convention is
filename--YYYYMMDD.HHMMSS.bak
- The naming convention is
PS C:> .\date_gap_finder.exe insert -a 1 -u days -R 0 -O .\e.csv
PS C:> cat e.csv
Date,Errors,Warnings
2021-04-15 06:55:01,0,23
2021-04-15 08:30:26,0,23
2021-04-16 06:55:01,0,23
2021-04-17 06:55:01,0,0
2021-04-18 06:55:01,0,0
2021-04-19 06:55:01,0,23
This example file is called f.csv
, which should get updated once per day. However, each entry is off by a few seconds. You can use the -p
switch to correct for this. It will pad time before and after the correct time. It is missing 5
days: 12-14
and 17-18
.
Date | Total |
---|---|
2021-03-10 18:40:01 | 317 |
2021-03-11 18:40:01 | 249 |
2021-03-15 18:40:04 | 287 |
2021-03-16 18:40:03 | 320 |
2021-03-19 18:40:06 | 102 |
__
- If a
6 second
time padding is not used, then8
rows will be returned. This is most likely not an accurate result since the times are only off by a few seconds. By using-p 6s
an accurate list of missing rows is returned. - Padding can end in
s
for seconds,m
for minutes, orh
for hours.Days
are not supported. - Note that
24 hours
is used instead of1 day
. - It is better to use time padding (-p) rather than using a longer time amount (-a)
$ date_gap_finder search -a 24 -u hours -p 6s f.csv
2021-03-12 18:40:01
2021-03-13 18:40:01
2021-03-14 18:40:01
2021-03-17 18:40:01
2021-03-18 18:40:01
This example file is called g.csv
. The date
column is in 1
instead of the normal column 0
. This file is also delimited by the tab
character instead of the comma
character. It has a date gap consisting of 04-15
through 04-18
.
Processed | Date |
---|---|
5125 | 2021-04-12 |
5197 | 2021-04-13 |
5206 | 2021-04-14 |
5222 | 2021-04-19 |
- Use
-d "\t"
to define the column delimiter -
- A short cut for the
tab
character is to simply use-t
instead of-d "\t"
- A short cut for the
- Use
-c
to denote column1
instead of the default of column0
- Note that
1440 minutes
is equal to1 day
PS C:\> .\date_gap_finder.exe search -a 1440 -u minutes -d "\t" -c 1 .\g.csv
2021-04-15
2021-04-16
2021-04-17
2021-04-18
- Similar switches to the above example, except that it is using the
insert
verb instead ofsearch
-r 0,9999
- in column0
, insert a value of9990
$ date_gap_finder insert -a 1440 -u minutes -t -c 1 -r 0,9999 g.csv
Processed Date
5125 2021-04-12
5197 2021-04-13
5206 2021-04-14
9999 2021-04-15
9999 2021-04-16
9999 2021-04-17
9999 2021-04-18
5222 2021-04-19
When using the insert
verb along with the -O
switch (which is used for overwriting the CSV file), a backup file will be created in the form of:
filename--YYYYMMDD.HHMMSS.bak
This backup file will be identical to the original CSV file. The number of backup files will continuously grow unless the -m
switch is used. For example, using -m 3
will limit the number of .bak
files to 3
.
$ date_gap_finder insert -a 1440 -u minutes -t -c 1 -r 0,9999 -m 3 g.csv
weekly.csv
is a CSV file with new entries occurring about once per week. The mode
is the duration that occurs most often in a CSV file.
PS C:> .\date_gap_finder.exe stats weekly.csv
+------------------------------+-----------------------+
| file | weekly.csv |
| records | 57 |
| columns | 6 |
| date/time layout | 06-Mar-2020 |
| average time between entries | 1 week 6 hours |
| total duration | 1 year 5 weeks 6 days |
| mode | 1 week (168h0m0s) |
| mode count | 54 |
+------------------------------+-----------------------+