Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change type of access for methods #24

Closed
WinterSilence opened this issue Apr 21, 2016 · 34 comments
Closed

Change type of access for methods #24

WinterSilence opened this issue Apr 21, 2016 · 34 comments
Assignees

Comments

@WinterSilence
Copy link
Contributor

Change type from private to protected for extending classes.

@samdark
Copy link
Owner

samdark commented Apr 21, 2016

Which concrete method do you need and for which case? Blindly changing all methods to protected is a bad idea.

@WinterSilence
Copy link
Contributor Author

WinterSilence commented Apr 21, 2016

Это не плохая идея т.к. все методы\свойства взаимосвязаны. Я хочу хранить данные в кеше, а не в файлах, следовательно мне нужно переопределить минимум Index: write() + Sitemap: createNewFile(), flush(), но само собой эти методы работают с другими методами и переменными которые что? - правильно! закрыты...

@WinterSilence
Copy link
Contributor Author

WinterSilence commented Apr 21, 2016

As a variant: you can add a new branch and implement it in version that will return content (not saved in files).

@samdark
Copy link
Owner

samdark commented Apr 21, 2016

Опишите свой кейс. Для чего именно вам хранить sitemap в кеше? Как вы его отдаёте?

@WinterSilence
Copy link
Contributor Author

Кеш (https://github.com/kohana/cache).
Контроллер загружает из кеша \ генерирует sitemap xml и отдает клиенту. Обращений к sitemap немного, поэтому такой вариант намного удобнее чем cron задание или пересоздание при каждом изменении в списке ссылок.

@samdark
Copy link
Owner

samdark commented Apr 21, 2016

У вас XML в кеше?

@WinterSilence
Copy link
Contributor Author

У меня xml текст в файловом кеше и каких-либо проблем по этому поводу я не испытываю. Основная мысль - дать возможность пользователю самому решать как ему хранить данные, не вижу смысла в подобном ограничении расширяемости, можно провести небольшой рефакторинг и отделить момент генерации от сохранения контента.

@WinterSilence
Copy link
Contributor Author

У Вас висит пару месяцев PR #16, пользователь не может реализовать сжатие в своем варианте, поэтому вынужден расширять Ваш, а Вы как я понимаю этого не хотите. Как итог: смысл библиотеки теряется.

@samdark
Copy link
Owner

samdark commented Apr 21, 2016

Генерацию от сохранения отделить можно, да. Но всё делать protected я против.

@WinterSilence
Copy link
Contributor Author

WinterSilence commented Apr 21, 2016

Ты сам посмотри что в итоге останется private если сделать эти методы protected. Сам разделением заниматься будешь или мне попробовать?

@samdark samdark self-assigned this Apr 21, 2016
@samdark
Copy link
Owner

samdark commented Apr 22, 2016

Я уже начал над этим работать, но с удовольствием посмотрел бы на альтернативное решение.

@WinterSilence
Copy link
Contributor Author

Сомневаюсь что оно будет альтернативным, но ради интереса попробую.

@ejunker
Copy link

ejunker commented Apr 28, 2016

I want to extend the classes and override the write() and flush() method so that I can have it write to Amazon S3 instead of the local filesystem. I can't do that because the methods are marked private instead of protected. I think getCurrentFilePath() and $this->writer would also need to be protected.

@samdark
Copy link
Owner

samdark commented Apr 28, 2016

@ejunker I think it's better to abstract writing into separate interface / class. If you have ideas about it, let's discuss it.

@naumanjkhan
Copy link

naumanjkhan commented Mar 9, 2018

@samdark @WinterSilence Any news on this? We have a use case where we just want the xml to be rendered using output memory rather in a file. Right now there is no option to get sitemap instance without passing a file to it.

  1. get/set the writer so it can be used with/without file
  2. if file is not provided, it can assume $this->writer->openURI('php://output');
class NullWriter implements WriterInterface
{
    public function __construct()
    {
    }

    public function append($data)
    {
    }

    public function finish()
    {
    }
}

__construct($filePath = null, $useXhtml = false)

if ($filePath !== null) {
    $dir = dirname($filePath);

    if (!is_dir($dir)) {
        throw new \InvalidArgumentException(
            "Please specify valid file path. Directory not exists. You have specified: {$dir}."
        );
    }

    $this->filePath = $filePath;
    $this->useXhtml = $useXhtml;
}

createNewFile()

if ($filePath) {
    if ($this->useGzip) {
        if (function_exists('deflate_init') && function_exists('deflate_add')) {
            $this->writerBackend = new DeflateWriter($filePath);
        } else {
            $this->writerBackend = new TempFileGZIPWriter($filePath);
        }
    } else {
        $this->writerBackend = new PlainFileWriter($filePath);
    }
} else {
    $this->writerBackend = new NullWriter();
}

if ($this->writerBackend instanceof NullWriter) {
    $this->writer->openURI('php://output');
} else {            
    $this->writer->openMemory();
}
  1. setIndentString() implementation to set the spacing

That is the general idea. I have seen other issue where complete new solution is suggested. Any news on that if any implementation is being done or in process.

@samdark
Copy link
Owner

samdark commented Mar 9, 2018

No implementation was done or is in progress. While it's OK to have an ability to write to any stream, it's not clear what's your particular use case.

@naumanjkhan
Copy link

naumanjkhan commented Mar 9, 2018

@samdark I don't want to use file to store xml. Just want to render it to the user. Are you open to accept PR with suggested changes?

@samdark
Copy link
Owner

samdark commented Mar 9, 2018

That's a very bad idea:

  1. Sitemap doesn't change that often.
  2. You'll get quite big memory usage considering default buffering in most frameworks.
  3. You won't be able to match standards on number of URLs in sitemap and sitemap size.
  4. You won't be able to use multiple sitemaps.

@naumanjkhan
Copy link

naumanjkhan commented Mar 9, 2018

  1. Considering a dynamic site where 100s of new stories are pushed on daily basis
  2. It will follow the same buffering as right now
  3. It will match the number of urls allowed, will flush and will never be more than 50K
  4. While writing to output uri no multiple sitemaps required. If so will fall back to original file implementation. (Its an extension, nothing to dump the old functionality altogahter)
  5. Don't want to manage file on regular basis (file concurrency issues for crud possibility)
  6. Will cache stream and no server load to generate each time

@samdark
Copy link
Owner

samdark commented Mar 9, 2018

  1. That's fine. If you regenerate sitemap on publishing new story means only 200 regenerations. If you generate it dynamically each time it means significantly more.
  2. Currently it dumps into file steam releasing memory. In case of output buffering it won't release memory till buffer is flushed that, in case of modern frameworks, it is at the very end of request-response cycle.
  3. That contradicts "100s of new stories are pushed on daily basis". You'll reach the limit in at max 1 year and 3 months with such publishing rate.
  4. Wrong. Search engine spider doesn't care how you serve the content. It has a hard limit of 50000 URLs and 10 megabytes of payload. It you're pass the limit it will error not accepting your sitemap at all.

@naumanjkhan
Copy link

  1. Writer flush will handle it by itself, as stream is output uri, it will release as soon as writer flush is called
  2. We only get the newly published urls not all, so 50K will still not reach
  3. Same as 3

@samdark
Copy link
Owner

samdark commented Mar 9, 2018

  1. Yes, the problem is when it's called but yeah, that could be irrelevant and it's fine not to care about it.
  2. Huh? It doesn't make sense. Search engines, as far as I remember, expect a full sitemap, not partial.

@naumanjkhan
Copy link

  1. It will be a full sitemap with new urls. So whenever the urls are updated new urls will be indexed as old are already being crawled

@samdark
Copy link
Owner

samdark commented Mar 9, 2018

I think it's a bad idea to submit just new URLs and omit old ones: https://webmasters.stackexchange.com/questions/2459/should-i-include-everything-in-the-sitemap-or-only-new-content. No search engine guarantees that if your page reached index it will stay there forever.

@naumanjkhan
Copy link

Got it. But I think its a separate issue while thinking of writing xml to uri rather file system or memory.
Still need a way to render content to source uri. right?

@samdark
Copy link
Owner

samdark commented Mar 9, 2018

Well, it was designed to work with files. For example, it advanced to next file when current one is full: https://github.com/samdark/sitemap/blob/master/Sitemap.php#L227. I'm not sure how to do that if you've specified a stream that's no a file.

@naumanjkhan
Copy link

Nothing happens as said earlier, writer just flushes the stream and carries on with the processing

  1. $this->writerBackend instanceof NullWriter do nothing
  2. Rather making file or memory objects on fly, one should inject in __construct()
  3. Depending on type of injection respective createNewFile() or createSourceUri() can be called
  4. Injecting object can change the code a bit and will not be backward compatible
  5. Creating object on fly, like it does at the moment and instanceof check will do

@samdark
Copy link
Owner

samdark commented Mar 10, 2018

Carries on with processing where? For files it opens a new file. Continuing flushing to output in this case would result in incorrect sitemap that won't be parsed by search engines. It doesn't make sense to me.

Same about memory. You'll need a sitemap index and a set of sitemaps but these are read in multiple requests so you can't direct these into streams.

@naumanjkhan
Copy link

It will output all the stream to source uri there will be no new create file due to check of NullWriter.

@samdark
Copy link
Owner

samdark commented Mar 12, 2018

But that basically won't be read when you reach 50000 elements or a 10 MB of data and you have to write all URLs into sitemap.

@naumanjkhan
Copy link

Agreed, but we don't have a use case of 50K urls or 10MB of data. If that is the case fallback is always the original implementation i.e. File System

@samdark
Copy link
Owner

samdark commented Mar 12, 2018

If you have FS, why do you even need to generate anything runtime at all?

@naumanjkhan
Copy link

Right now, we don't have any need to use filesystem when we can manage on runtime. Handling File Read/write Concurrency not desired

@samdark
Copy link
Owner

samdark commented Mar 12, 2018

I see. Well, sorry but I don't think having a way to easily shoot into your own leg is a good idea. Therefore, do your own fork.

@samdark samdark closed this as completed Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants