Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evicting pages from cache to disk #5

Closed
cesarghali opened this issue Apr 22, 2013 · 16 comments
Closed

Evicting pages from cache to disk #5

cesarghali opened this issue Apr 22, 2013 · 16 comments
Labels

Comments

@cesarghali
Copy link
Collaborator

Here's the scenario, which is happening in private test case 7:

The test case is adding 50 pages to the file. Since the cache size is only 10 frames, then pages 0 to 9 will be stored in cache (in frames 0 to 9 respectively). At this point all the pages in cache are used once (which means accessed once for either read or write). Now when page 10 is added, the LFU will evict frame 0 from cache and since the page is dirty it will be written to disk. Page 10 will be stored in the cache at frame 0 and the usage for this page will be 1. The problem occurs when page 11 is added to the cache. What will happen is that LFU will evict frame 0 (because all frames has usage of 1), and then page 10 (which is in frame 0) will be written to disk. An error will occur because the file on disk has only page 0.

An easy and simple solution for this, which I don't like because it contradict with the requirements for project 1, is to remove the condition that if the page number that we are writing to disk is larger than the number of pages in the file. This will allow for empty pages between page 0 and page 10. However, since we know that these pages are in the cache, they will be eventually written to disk at some point.

Another solution is when writing pages to file, consider the number of pages in the file and in the cache together. This will also allow for empty pages between 0 and 10, but will make sure that these pages are in the cache.

EDIT: For reference here is the piazza question https://piazza.com/class#spring2013/cs222cs122c/62

@ghost ghost assigned diedthreetimes and ekinoguz Apr 22, 2013
@cesarghali
Copy link
Collaborator Author

Another solution is to look for all the pages between 0 and 9 and evict them from the cache to the disk before writing page 10 to disk, this is expensive i think.

@diedthreetimes
Copy link
Collaborator

How about this. When determining the least frequently used page, do one of the following.

A: Only consider pages that are consecutive. I.e. in your example 10 will not be a canidate.
B: If a page is chosen that cannot be written to disk because it depends on other writes happening first. Evict the earlier page instead.

@cesarghali
Copy link
Collaborator Author

A: this works only when there is no pinning, which is our case. Plus it will be costly because every time I look for a candidate, I need to seek the whole file to check its size on disk, or I have to keep track of this also in the cache.
B: need to check the whole cache for pages that need to be evicted (or we need to keep track of them). I don't think this is a simple task because we need to differentiate between pages that already exists on the file and reside in the cache for some reason and pages that are appended to the file but still in the cache. For instance, assume that the file has pages 0, 1, 2, 3 on disk, and pages 1, 2, 3, 4, 5, 6 are in the cache. Then if you want to evict page 6 you only need to write 4 and 5 before that. No need to write 1, 2 and 3. This needs that we need to keep track of the page number of the file on disk also as part of the cache info.

Btw, I implemented this solution: "when writing pages to file, consider the number of pages in the file and in the cache together. This will also allow for empty pages between 0 and 10, but will make sure that these pages are in the cache." It is working, do we need to invest time in one of the A of B solutions?

@diedthreetimes
Copy link
Collaborator

It might be a good idea to keep the size in the cache anyways, so that no operations will go to disk. (I'm not sure which operations require the size, but some might). For both of the solutions I proposed I made the assumption that we would store the number of pages on disk for every file. This should address both concerns.

Can you elaborate on your solution? If it's working though, I don't think it really matters, and you should just keep it :P

@cesarghali
Copy link
Collaborator Author

OK, here's what I am doing:

When PF_Manager opens a file, it add the file name and the number of pages in the file (obtained form seeking the whole file) to a map in the cache. Of course when another fileHandle is opened to the same file, the size of the file is not read form disk again. Now when a new page is appended to this file by any fileHandle, the number of pages stored in the cache will be increased.

Now, assume the cache contains 10 pages for a specific file, and only one of them (page 0) is physically written to disk. When we write page 9 to disk, WritePageToDisk will check the number of pages of this file that is already tracked in this cache, and will write the page on the file even if there are some empty pages in between. The reason why this works, is that since the cache said that the number of pages is 10, even if we only have 1 page on disk, the other will eventually be written to disk and fill the empty space in between. Unless the system crashes then we might have empty pages. Are we dealing with crash and recovery?

@ekinoguz
Copy link
Owner

Quick resource for crash from project req: "For each operation, you should make sure that the "effect" of the operation (if any) has been stored in the file on disk. In case there is a power failure, the data is still on the disk. For example, for the "updateTuple" operation, after the function successfully returns, the updated record should physically reside in the file on the disk."

@cesarghali
Copy link
Collaborator Author

Then this means that pages that are written to cache, should also be written to disk. I think then we don't need to keep track of dirty pages because we always write to disk, right?

@diedthreetimes
Copy link
Collaborator

Cesar: I think this seems fine.

Ekin: Where did you see this? I don't think this applies for the cache portion of the project. From what I gathered you can't have true crash recovery when using a cache unless you have a transactional logger.

@ekinoguz
Copy link
Owner

there is an another possibility for empty pages for that solution: user can delete all the tuples in between and our page can end with just two pages 1 & 10 and empty space between

@diedthreetimes: https://grape.ics.uci.edu/wiki/asterix/wiki/cs222-2013-spring-project2, memory requirements, do you want print screen?

@diedthreetimes
Copy link
Collaborator

Cesar: Also, out of curiosity how do you write the page with the extra space? Do you just seek past the end, or do you append many blank pages? Seeking past the end may cause problems in the future.

@diedthreetimes
Copy link
Collaborator

Deleting tuples shouldn't cause deletion of a page should it?

@ekinoguz
Copy link
Owner

we can seek to that position in file but we should remove "if (pageNum > GetNumberOfPages()) return -1" from WritePageToDisk()

@cesarghali
Copy link
Collaborator Author

We should not remove this. With the version of the cache, that I will commit in 1 minutes, the whole issue is fixed.

Deleting tuples should not cause the deletion of pages. Pages that are empty will hold other tuples in the future. We should only delete pages when we reorganize the whole file (table).

@diedthreetimes
Copy link
Collaborator

Maybe this is a question for the T.A. i'm with @cesarghali I don't understand the use of dirty bits if we have to write after every operation.

@cesarghali
Copy link
Collaborator Author

OK, I will post on Piazza :) this will increase my contributions :p

@ekinoguz
Copy link
Owner

that was very quick 🎱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants