Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nitter] Can't use date in directory format without an error #5253

Closed
alnmy opened this issue Feb 29, 2024 · 6 comments
Closed

[nitter] Can't use date in directory format without an error #5253

alnmy opened this issue Feb 29, 2024 · 6 comments
Labels

Comments

@alnmy
Copy link

alnmy commented Feb 29, 2024

When attempting to use gallery-dlon "nitter/elonmusk" as an example, the extractor gives me an error after the first video successfully downloads and is saved, "[nitterHostname][error] DirectoryFormatError: Applying directory format string failed (ValueError: Invalid format specifier '%Y%m%d' for object of type 'str')"

Full output with -v
~ $ gallery-dl -v nitter:http://nitterHostname/elonmusk
[gallery-dl][debug] Version 1.26.7
[gallery-dl][debug] Python 3.12.2 - Linux-6.5.13-x86_64-with
[gallery-dl][debug] requests 2.31.0 - urllib3 2.2.0
[gallery-dl][debug] Configuration Files ['/etc/gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'nitter:http://nitterHostname/elonmusk'
[nitterHostname][debug] Using NitterTweetsExtractor for 'nitter:http://nitterHostname/elonmusk'
[urllib3.connectionpool][debug] Starting new HTTP connection (1): nitterHostname:80
[urllib3.connectionpool][debug] http://nitterHostname:80 "GET /elonmusk HTTP/1.1" 200 56940
[nitterHostname][debug] Active postprocessor modules: [MetadataPP]
[nitterHostname][debug] Skipping 1763064298052415835 (retweet)
[nitterHostname][debug] Skipping 1762993627742020024 (retweet)
[nitterHostname][debug] Skipping 1762991973307171073 (retweet)
[nitterHostname][debug] Skipping 1762913390450577459 (retweet)
[nitterHostname][debug] Skipping 1762865068671176773 (retweet)
[nitterHostname][debug] Skipping 1762883201523933203 (retweet)
./gallery-dl/twitter/elonmusk/20240228/elonmusk_0813_1762752859567763768_1.mp4
[nitterHostname][debug] Skipping 1762531618886234556 (retweet)
[urllib3.connectionpool][debug] http://nitterHostname:80 "GET /elonmusk?cursor=DAABCgABGHgaEQC__-gKAAIYdmacGlegHAgAAwAAAAIAAA HTTP/1.1" 200 80248
[nitterHostname][error] DirectoryFormatError: Applying directory format string failed (ValueError: Invalid format specifier '%Y%m%d' for object of type 'str')

/etc/gallery-dl.conf
{
    "extractor": {
        "nitter": {
            "quoted": false,
            "retweets": false,
            "videos": true,
            "directory": [
                "twitter", "{user['name']}", "{date:%Y%m%d}" 
            ],
            "filename": "{user['name']}_{date:%H%M}_{tweet_id}_{num}.{extension}",
            "postprocessors": [
                {
                    "name": "metadata",
                    "event": "after"
                }
            ],
            "nitterHostname": {
                "root": "http://nitterHostname"
            }
        }
    }
}

While I understand this may be hard to test due to a lack of public Nitter instances that are now available, there are still some which can be accessed and are listed on https://status.d420.de/ and I'm willing to help where I can. Could also just be an error with my config

@alnmy
Copy link
Author

alnmy commented Feb 29, 2024

I should note that other accounts successfully download without any issues, it's simply elonmusk's profile which fails after the first download

Edit: this isn't true, it happens on other accounts too but i'm not sure what's triggering it and it's most easy to test on elonmusk's as it happens most instantly

@mikf
Copy link
Owner

mikf commented Feb 29, 2024

Please do NOT use these instances for scraping, host nitter yourself.

Anyway, this happens when parsing fails and date is returned as plain str instead of a formattable datetime.

I can't reproduce this error myself, so could you run it with --filter 'print(type(date), date)' and post the line(s) that don't start with <class 'datetime.datetime'>?

@alnmy
Copy link
Author

alnmy commented Feb 29, 2024

I can't reproduce this error myself, so could you run it with --filter 'print(type(date), date)' and post the line(s) that don't start with <class 'datetime.datetime'>?

The output is pretty much identical to before. I also tried deleting the elonmusk folder and the output seems the same regardless.

Output, including the line because it simply appears once
~ $ gallery-dl -v nitter:http://nitterHostname/elonmusk --filter 'print(type(date), date)'
[gallery-dl][debug] Version 1.26.7
[gallery-dl][debug] Python 3.12.2 - Linux-6.5.13-x86_64-with
[gallery-dl][debug] requests 2.31.0 - urllib3 2.2.0
[gallery-dl][debug] Configuration Files ['/etc/gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'nitter:http://nitterHostname/elonmusk'
[nitterHostname][debug] Using NitterTweetsExtractor for 'nitter:http://nitterHostname/elonmusk'
[urllib3.connectionpool][debug] Starting new HTTP connection (1): nitterHostname:80
[urllib3.connectionpool][debug] http://nitterHostname:80 "GET /elonmusk HTTP/1.1" 200 56941
[nitterHostname][debug] Active postprocessor modules: [MetadataPP]
[nitterHostname][debug] Skipping 1763064298052415835 (retweet)
[nitterHostname][debug] Skipping 1762993627742020024 (retweet)
[nitterHostname][debug] Skipping 1762991973307171073 (retweet)
[nitterHostname][debug] Skipping 1762913390450577459 (retweet)
[nitterHostname][debug] Skipping 1762865068671176773 (retweet)
[nitterHostname][debug] Skipping 1762883201523933203 (retweet)
<class 'datetime.datetime'> 2024-02-28 08:13:00
[nitterHostname][debug] Skipping 1762531618886234556 (retweet)
[urllib3.connectionpool][debug] http://nitterHostname:80 "GET /elonmusk?cursor=DAABCgABGHgx3VI__-gKAAIYdmacGlegHAgAAwAAAAIAAA HTTP/1.1" 200 80248
[nitterHostname][error] DirectoryFormatError: Applying directory format string failed (ValueError: Invalid format specifier '%Y%m%d' for object of type 'str')
Output when non-verbose
<class 'datetime.datetime'> 2024-02-28 08:13:00
[nitter.ts.alnn.xyz][error] DirectoryFormatError: Applying directory format string failed (ValueError: Invalid format specifier '%Y%m%d' for object of type 'str')

@alnmy
Copy link
Author

alnmy commented Feb 29, 2024

I should also note that the issue seems to only happen with the directory name for some reason... when I have {date:%Y-%m-%d} in the filename it doesn't happen.

Configuration file
{
    "extractor": {
        "nitter": {
            "quoted": false,
            "retweets": false,
            "videos": true,
            "directory": [
                "twitter", "{user['name']}" 
            ],
            "filename": "{user['name']}_{date:%Y%m%d}_{date:%H%M}_{tweet_id}_{num}.{extension}",
            "postprocessors": [
                {
                    "name": "metadata",
                    "event": "after"
                }
            ],
            "nitterHostname": {
                "root": "http://nitterHostname"
            }
        }
    }
}

When running with the filter that you requested, I don't get any lines which don't have the <class line.

@mikf
Copy link
Owner

mikf commented Feb 29, 2024

Right, this doesn't actually work since directory gets evaluated before --filter ...

What you could do as a workaround is check the type of date and only format it when it's a datetime:

    "filename": {
        "isinstance(date, datetime)": "{user['name']}_{date:%H%M}_{tweet_id}_{num}.{extension}",
        ""                          : "{user['name']}_{date}_{tweet_id}_{num}.{extension}"
    },
    "directory":{
        "isinstance(date, datetime)": ["twitter", "{user['name']}", "{date:%Y%m%d}"],
        ""                          : ["twitter", "{user['name']}", "{date}"]
    }

@mikf
Copy link
Owner

mikf commented Feb 29, 2024

I should also note that the issue seems to only happen with the directory name for some reason... when I have {date:%Y-%m-%d} in the filename it doesn't happen.

Now that sounds kind of impossible, since date should be the same for both.

There is a "tweet" with no files where parsing completely fails.

show-more"><a href="/elonmusk">Load newest</a></div>

mikf added a commit that referenced this issue Feb 29, 2024
@mikf mikf added the site:bug label Feb 29, 2024
@mikf mikf closed this as completed Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants