Scraper only collects 118 byte files #49

Pr0j3ct · 2024-07-22T13:59:01Z

Approx 2 weeks ago the scraper only started collecting 118 byte files.

Does not appear to be IP address related. Has the VSCO API changed?

sideloading · 2024-07-22T14:51:19Z

Same issue here #48. I'm using https://github.com/mikf/gallery-dl which is working fine

Pr0j3ct · 2024-07-25T15:21:04Z

One thing I noticed was that the sub-domain returns 403:
i.vsco.co

but using url like so:
vsco.co/i

returns the image without problem.

I'm no programmer but when I have some free time I may try and refactor atleast one of the modules to support that change and see what happens.

intothevoid33 · 2024-07-26T15:53:56Z

@Pr0j3ct what do you mean?

I put a print statement into the script to see what it was trying to download. What printed out matched what I got when manually going to the gallery page, selecting and image and then inspecting it.

parkerr82 · 2024-07-26T17:04:49Z

The API has definitely changed. Digging through the gallery-dl project I can see that they’re using a different API call It’s essentially /api/3.0/ Whereas the current version of this project uses /api/2.0/

…

On Thu, Jul 25, 2024 at 10:21 AM Project ***@***.***> wrote: One thing I noticed was that the sub-domain returns 403: i.vsco.co but using url like so: vsco.co/i returns the image without problem. I'm no programmer but when I have some free time I may try and refactor atleast one of the modules to support that change and see what happens. — Reply to this email directly, view it on GitHub <#49 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AXYLG6DADYKKHKEGHF52CCTZOEJXRAVCNFSM6AAAAABLIMKXR6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJQGY2TSNZZGA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

timbo0o1 · 2024-08-01T21:15:06Z

Edit: Seems like they block the default request header which is used by the script.

You could simply set a custom header to your requests to get the images.

create a new entry in constants.py

images = {
    'User-Agent': random.choice(user_agents),
    'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5',
    'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
    'Connection': 'keep-alive',
    'Referer': 'https://vsco.co/',
    'Sec-Fetch-Dest': 'image',
    'Sec-Fetch-Mode': 'no-cors',
    'Sec-Fetch-Site': 'same-site',
    'Priority': 'u=4, i',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
}

use them in vscoscrape.py

def download_img_normal(self, lists):
        if lists[2] is False:
            if f"{lists[1]}.jpg" in os.listdir():
                return True
            with open(f"{str(lists[1])}.jpg", "wb") as file:
                file.write(requests.get(lists[0], headers=constants.images, stream=True).content)
        else:
            if f"{lists[1]}.mp4" in os.listdir():
                return True
            with open(f"{str(lists[1])}.mp4", "wb") as file:
                for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content(
                    chunk_size=1024
                ):
                    if chunk:
                        file.write(chunk)
        return True

Alternatively you could use cloudscraper instead of the python requests.

pip install cloudscraper

import cloudscraper
class Scraper(object):
    def __init__(self, cache, latestCache):
        self.cache = cache
        self.latestCache = latestCache
        self.scraper = cloudscraper.create_scraper()

def download_img_journal(self, lists):
        """
        Downloads the journal media in specified ways depending on the type of media

        Since Journal items can be text files, images, or videos, I had to make 3
        different ways of downloading

        :params: lists - No idea why I named it this, but it's a media item
        :return: a boolean on whether the journal media was able to be downloaded
        """
        if lists[1] == "txt":
            with open(f"{str(lists[0])}.txt", "w") as file:
                file.write(lists[0])
        if lists[2] == "img":
            if f"{lists[1]}.jpg" in os.listdir():
                return True
            with open(f"{str(lists[1])}.jpg", "wb") as file:
                file.write(self.scraper.get(lists[0], stream=True).content)

        elif lists[2] == "vid":
            if f"{lists[1]}.mp4" in os.listdir():
                return True
            with open(f"{str(lists[1])}.mp4", "wb") as file:
                for chunk in self.scraper.get(lists[0], stream=True).iter_content(
                    chunk_size=1024
                ):
                    if chunk:
                        file.write(chunk)
        self.progbarj.update()
        return True

def download_img_normal(self, lists):
        """
        This function makes sense at least

        The if '%s.whatever' sections are to skip downloading the file again if it's already been downloaded

        At the time I wrote this, I only remember seeing that images and videos were the only things allowed

        So I didn't write an if statement checking for text files, so this would just skip it I believe if it ever came up
        and return True

        :params: lists - My naming sense was beat. lists is just a media item.
        :return: a boolean on whether the media item was downloaded successfully
        """
        if lists[2] is False:
            if f"{lists[1]}.jpg" in os.listdir():
                return True
            with open(f"{str(lists[1])}.jpg", "wb") as file:
                file.write(self.scraper.get(lists[0], stream=True).content)
        else:
            if f"{lists[1]}.mp4" in os.listdir():
                return True
            with open(f"{str(lists[1])}.mp4", "wb") as file:
                for chunk in self.scraper.get(lists[0], stream=True).iter_content(
                    chunk_size=1024
                ):
                    if chunk:
                        file.write(chunk)
        return True

intothevoid33 · 2024-08-08T22:03:19Z

Edit: Seems like they block the default request header which is used by the script.

You could simply set a custom header to your requests to get the images.

create a new entry in constants.py

images = {
    'User-Agent': random.choice(user_agents),
    'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5',
    'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
    'Connection': 'keep-alive',
    'Referer': 'https://vsco.co/',
    'Sec-Fetch-Dest': 'image',
    'Sec-Fetch-Mode': 'no-cors',
    'Sec-Fetch-Site': 'same-site',
    'Priority': 'u=4, i',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
}

use them in vscoscrape.py

def download_img_normal(self, lists):
        if lists[2] is False:
            if f"{lists[1]}.jpg" in os.listdir():
                return True
            with open(f"{str(lists[1])}.jpg", "wb") as file:
                file.write(requests.get(lists[0], headers=constants.images, stream=True).content)
        else:
            if f"{lists[1]}.mp4" in os.listdir():
                return True
            with open(f"{str(lists[1])}.mp4", "wb") as file:
                for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content(
                    chunk_size=1024
                ):
                    if chunk:
                        file.write(chunk)
        return True

That works perfectly, thank you!

spilla7 · 2024-08-20T01:18:24Z

Co

Edit: Seems like they block the default request header which is used by the script.

Could someone please explain how to do this? Would like to get this working again. I've tried gallery-dl but prefer vscoscraper.

timbo0o1 · 2024-08-20T12:17:54Z

Co

Edit: Seems like they block the default request header which is used by the script.

Could someone please explain how to do this? Would like to get this working again. I've tried gallery-dl but prefer vscoscraper.

I´ve already explained how to do this.
Where exactly do you need help?

spilla7 · 2024-08-24T05:25:50Z

Co
I´ve already explained how to do this. Where exactly do you need help?

I can see where to replace the txt in the constants.py file. But I'm not sure where to add the txt to the vscoscrpae.py file.

I've tried adding at the end but i get an error message when I run the script

Cheers

AxelConceicao · 2024-08-27T14:36:35Z

Co
I´ve already explained how to do this. Where exactly do you need help?

I can see where to replace the txt in the constants.py file. But I'm not sure where to add the txt to the vscoscrpae.py file.

I've tried adding at the end but i get an error message when I run the script

Cheers

nothing to replace in constants.py, just add images dict
and add headers=constants.images like he did in download_img_normal func

billyklubb · 2024-08-28T00:22:14Z

Edit: Seems like they block the default request header which is used by the script.

You could simply set a custom header to your requests to get the images.

1. create a new entry in constants.py

images = {
    'User-Agent': random.choice(user_agents),
    'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5',
    'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
    'Connection': 'keep-alive',
    'Referer': 'https://vsco.co/',
    'Sec-Fetch-Dest': 'image',
    'Sec-Fetch-Mode': 'no-cors',
    'Sec-Fetch-Site': 'same-site',
    'Priority': 'u=4, i',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
}

2. use them in vscoscrape.py

def download_img_normal(self, lists):
        if lists[2] is False:
            if f"{lists[1]}.jpg" in os.listdir():
                return True
            with open(f"{str(lists[1])}.jpg", "wb") as file:
                file.write(requests.get(lists[0], headers=constants.images, stream=True).content)
        else:
            if f"{lists[1]}.mp4" in os.listdir():
                return True
            with open(f"{str(lists[1])}.mp4", "wb") as file:
                for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content(
                    chunk_size=1024
                ):
                    if chunk:
                        file.write(chunk)
        return True

Alternatively you could use cloudscraper instead of the python requests.

pip install cloudscraper

import cloudscraper
class Scraper(object):
    def __init__(self, cache, latestCache):
        self.cache = cache
        self.latestCache = latestCache
        self.scraper = cloudscraper.create_scraper()

def download_img_journal(self, lists):
        """
        Downloads the journal media in specified ways depending on the type of media

        Since Journal items can be text files, images, or videos, I had to make 3
        different ways of downloading

        :params: lists - No idea why I named it this, but it's a media item
        :return: a boolean on whether the journal media was able to be downloaded
        """
        if lists[1] == "txt":
            with open(f"{str(lists[0])}.txt", "w") as file:
                file.write(lists[0])
        if lists[2] == "img":
            if f"{lists[1]}.jpg" in os.listdir():
                return True
            with open(f"{str(lists[1])}.jpg", "wb") as file:
                file.write(self.scraper.get(lists[0], stream=True).content)

        elif lists[2] == "vid":
            if f"{lists[1]}.mp4" in os.listdir():
                return True
            with open(f"{str(lists[1])}.mp4", "wb") as file:
                for chunk in self.scraper.get(lists[0], stream=True).iter_content(
                    chunk_size=1024
                ):
                    if chunk:
                        file.write(chunk)
        self.progbarj.update()
        return True

def download_img_normal(self, lists):
        """
        This function makes sense at least

        The if '%s.whatever' sections are to skip downloading the file again if it's already been downloaded

        At the time I wrote this, I only remember seeing that images and videos were the only things allowed

        So I didn't write an if statement checking for text files, so this would just skip it I believe if it ever came up
        and return True

        :params: lists - My naming sense was beat. lists is just a media item.
        :return: a boolean on whether the media item was downloaded successfully
        """
        if lists[2] is False:
            if f"{lists[1]}.jpg" in os.listdir():
                return True
            with open(f"{str(lists[1])}.jpg", "wb") as file:
                file.write(self.scraper.get(lists[0], stream=True).content)
        else:
            if f"{lists[1]}.mp4" in os.listdir():
                return True
            with open(f"{str(lists[1])}.mp4", "wb") as file:
                for chunk in self.scraper.get(lists[0], stream=True).iter_content(
                    chunk_size=1024
                ):
                    if chunk:
                        file.write(chunk)
        return True

Hey, so I am not a programmer in the least, the first two files you are referring to constants.py and vscoscrape.py, where are those located? and where are those new entries supposed to be in the files you mention? Of course any help is sincerely appreciated!

Edit: so when I look through the git for vsco-scraper I see the two files you are talking about, I am not sure what I am supposed to do with those files. I installed vsco-scraper with pip, so in this case do I need to edit the source and perform a build/compile or something along those lines? Forgive me, I only know that the vsco-scraper is in the bin folder off of my linux profile, after that I have zero ideas on what to do... =(

timbo0o1 · 2024-08-28T00:52:54Z

Hey, so I am not a programmer in the least, the first two files you are referring to constants.py and vscoscrape.py, where are those located? and where are those new entries supposed to be in the files you mention? Of course any help is sincerely appreciated!

Edit: so when I look through the git for vsco-scraper I see the two files you are talking about, I am not sure what I am supposed to do with those files. I installed vsco-scraper with pip, so in this case do I need to edit the source and perform a build/compile or something along those lines? Forgive me, I only know that the vsco-scraper is in the bin folder off of my linux profile, after that I have zero ideas on what to do... =(

if you installed vscoscrape with pip the files are located in your python installation.
Edit: to locate a pip package you can use the command "pip show vsco-scraper"
for example C:\Python310\Lib\site-packages\vscoscrape
You find the files there. (constants.py / vscoscrape.py).

No need to build from source. Just use the pip package and do the following.
Now open constants.py with your text editor and at the end of the file you paste this:

images = {
    'User-Agent': random.choice(user_agents),
    'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5',
    'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
    'Connection': 'keep-alive',
    'Referer': 'https://vsco.co/',
    'Sec-Fetch-Dest': 'image',
    'Sec-Fetch-Mode': 'no-cors',
    'Sec-Fetch-Site': 'same-site',
    'Priority': 'u=4, i',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
}

Now open vscoscrape.py and search for download_img_normal
From there you select the whole function (until "return true")
Then you copy my function and replace it:

def download_img_normal(self, lists):
       
        if lists[2] is False:
            if f"{lists[1]}.jpg" in os.listdir():
                return True
            with open(f"{str(lists[1])}.jpg", "wb") as file:
                file.write(requests.get(lists[0], headers=constants.images, stream=True).content)
        else:
            if f"{lists[1]}.mp4" in os.listdir():
                return True
            with open(f"{str(lists[1])}.mp4", "wb") as file:
                for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content(
                    chunk_size=1024
                ):
                    if chunk:
                        file.write(chunk)
        return True

billyklubb · 2024-08-28T01:29:59Z

Hey, so I am not a programmer in the least, the first two files you are referring to constants.py and vscoscrape.py, where are those located? and where are those new entries supposed to be in the files you mention? Of course any help is sincerely appreciated!
Edit: so when I look through the git for vsco-scraper I see the two files you are talking about, I am not sure what I am supposed to do with those files. I installed vsco-scraper with pip, so in this case do I need to edit the source and perform a build/compile or something along those lines? Forgive me, I only know that the vsco-scraper is in the bin folder off of my linux profile, after that I have zero ideas on what to do... =(

if you installed vscoscrape with pip the files are located in your python installation. Edit: to locate a pip package you can use the command "pip show vsco-scraper" for example C:\Python310\Lib\site-packages\vscoscrape You find the files there. (constants.py / vscoscrape.py).

No need to build from source. Just use the pip package and do the following. Now open constants.py with your text editor and at the end of the file you paste this:
images = {
    'User-Agent': random.choice(user_agents),
    'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5',
    'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
    'Connection': 'keep-alive',
    'Referer': 'https://vsco.co/',
    'Sec-Fetch-Dest': 'image',
    'Sec-Fetch-Mode': 'no-cors',
    'Sec-Fetch-Site': 'same-site',
    'Priority': 'u=4, i',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
}
Now open vscoscrape.py and search for download_img_normal From there you select the whole function (until "return true") Then you copy my function and replace it:
def download_img_normal(self, lists):
       
        if lists[2] is False:
            if f"{lists[1]}.jpg" in os.listdir():
                return True
            with open(f"{str(lists[1])}.jpg", "wb") as file:
                file.write(requests.get(lists[0], headers=constants.images, stream=True).content)
        else:
            if f"{lists[1]}.mp4" in os.listdir():
                return True
            with open(f"{str(lists[1])}.mp4", "wb") as file:
                for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content(
                    chunk_size=1024
                ):
                    if chunk:
                        file.write(chunk)
        return True

Thank you very much!! Those changes were easy enough, first attempt gave me an indentation error, I just needed to move the
"def download_img_normal(self, lists):" line over a tab space to line up with all the others and it ran without issue! I really appreciate your time! =)

Edit: I tested if for journals, it produces the 118k files, I tried to sort it out, the block for journals is very different...

Edit: I figured it out, I looked for the function for downloading journals, and added "headers=constants.images" to the jpg and mp4 lines and it worked like a charm!

I'm certainly not a python programmer now...lol but reading through your code, I see that constants.images must refer to the constants.py file and the .images must refer to the images entry that you had me add! Thanks for helping me see it! =)

bebunw · 2024-10-01T02:02:10Z

thanks vm @timbo0o1, i know there is gallery-dl but it doesnt keep the same original filename and for updating an old folder it was ass

birizui · 2024-10-25T07:30:29Z

Hey, so I am not a programmer in the least, the first two files you are referring to constants.py and vscoscrape.py, where are those located? and where are those new entries supposed to be in the files you mention? Of course any help is sincerely appreciated!
Edit: so when I look through the git for vsco-scraper I see the two files you are talking about, I am not sure what I am supposed to do with those files. I installed vsco-scraper with pip, so in this case do I need to edit the source and perform a build/compile or something along those lines? Forgive me, I only know that the vsco-scraper is in the bin folder off of my linux profile, after that I have zero ideas on what to do... =(

if you installed vscoscrape with pip the files are located in your python installation. Edit: to locate a pip package you can use the command "pip show vsco-scraper" for example C:\Python310\Lib\site-packages\vscoscrape You find the files there. (constants.py / vscoscrape.py).

No need to build from source. Just use the pip package and do the following. Now open constants.py with your text editor and at the end of the file you paste this:
images = {
    'User-Agent': random.choice(user_agents),
    'Accept': 'image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5',
    'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
    'Connection': 'keep-alive',
    'Referer': 'https://vsco.co/',
    'Sec-Fetch-Dest': 'image',
    'Sec-Fetch-Mode': 'no-cors',
    'Sec-Fetch-Site': 'same-site',
    'Priority': 'u=4, i',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
}
Now open vscoscrape.py and search for download_img_normal From there you select the whole function (until "return true") Then you copy my function and replace it:
def download_img_normal(self, lists):
       
        if lists[2] is False:
            if f"{lists[1]}.jpg" in os.listdir():
                return True
            with open(f"{str(lists[1])}.jpg", "wb") as file:
                file.write(requests.get(lists[0], headers=constants.images, stream=True).content)
        else:
            if f"{lists[1]}.mp4" in os.listdir():
                return True
            with open(f"{str(lists[1])}.mp4", "wb") as file:
                for chunk in requests.get(lists[0],headers=constants.images, stream=True).iter_content(
                    chunk_size=1024
                ):
                    if chunk:
                        file.write(chunk)
        return True

hey, thanks for the previous help. unfortunately the script doesn't work again. i tried to run it, but it shows '... crashed' for every usernames in my txt file. please take a look... thank you

timbo0o1 · 2024-10-25T11:44:57Z

hey, thanks for the previous help. unfortunately the script doesn't work again. i tried to run it, but it shows '... crashed' for every usernames in my txt file. please take a look... thank you

Maybe take a look in here #50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraper only collects 118 byte files #49

Scraper only collects 118 byte files #49

Pr0j3ct commented Jul 22, 2024

sideloading commented Jul 22, 2024

Pr0j3ct commented Jul 25, 2024

intothevoid33 commented Jul 26, 2024

parkerr82 commented Jul 26, 2024 via email

timbo0o1 commented Aug 1, 2024 •

edited

Loading

intothevoid33 commented Aug 8, 2024

spilla7 commented Aug 20, 2024

timbo0o1 commented Aug 20, 2024

spilla7 commented Aug 24, 2024 •

edited

Loading

AxelConceicao commented Aug 27, 2024

billyklubb commented Aug 28, 2024 •

edited

Loading

timbo0o1 commented Aug 28, 2024 •

edited

Loading

billyklubb commented Aug 28, 2024 •

edited

Loading

bebunw commented Oct 1, 2024 •

edited

Loading

birizui commented Oct 25, 2024 •

edited

Loading

timbo0o1 commented Oct 25, 2024

Scraper only collects 118 byte files #49

Scraper only collects 118 byte files #49

Comments

Pr0j3ct commented Jul 22, 2024

sideloading commented Jul 22, 2024

Pr0j3ct commented Jul 25, 2024

intothevoid33 commented Jul 26, 2024

parkerr82 commented Jul 26, 2024 via email

timbo0o1 commented Aug 1, 2024 • edited Loading

intothevoid33 commented Aug 8, 2024

spilla7 commented Aug 20, 2024

timbo0o1 commented Aug 20, 2024

spilla7 commented Aug 24, 2024 • edited Loading

AxelConceicao commented Aug 27, 2024

billyklubb commented Aug 28, 2024 • edited Loading

timbo0o1 commented Aug 28, 2024 • edited Loading

billyklubb commented Aug 28, 2024 • edited Loading

bebunw commented Oct 1, 2024 • edited Loading

birizui commented Oct 25, 2024 • edited Loading

timbo0o1 commented Oct 25, 2024

timbo0o1 commented Aug 1, 2024 •

edited

Loading

spilla7 commented Aug 24, 2024 •

edited

Loading

billyklubb commented Aug 28, 2024 •

edited

Loading

timbo0o1 commented Aug 28, 2024 •

edited

Loading

billyklubb commented Aug 28, 2024 •

edited

Loading

bebunw commented Oct 1, 2024 •

edited

Loading

birizui commented Oct 25, 2024 •

edited

Loading