Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The incorrect resuts #5

Open
pocean opened this issue Aug 8, 2020 · 4 comments
Open

The incorrect resuts #5

pocean opened this issue Aug 8, 2020 · 4 comments

Comments

@pocean
Copy link

pocean commented Aug 8, 2020

Dear code authors:
I’m using the OptaneStudy, but I have encountered some problems in using it.
I use it in Fedora 27 and linux kernel 4.13.16. I config the Octane use these commands:
1. Create pmem devices
ipmctl create -goal PersistentMemoryType=AppDirect
ndctl create-namespace
and get these regions:
[
{
"dev":"namespace3.0”,
"mode":"fsdax”,
"map":"dev”,
"size":133175443456,
"uuid":"8824fbe5-6b52-4463-97b3-ee1c76cdf685”,
"blockdev":"pmem3”
},
{
"dev":"namespace2.0”,
"mode":"fsdax”,
"map":"dev”,
"size":133175443456,
"uuid":"904229c9-0cc0-425e-a288-c90cfd9a1ba8”,
"blockdev":"pmem2”
}
]

2. Execute the mount.sh
sh /root/OptaneStudy/src/testscript/mount.sh /dev/pmem2 /dev/pmem3

3. Test Latency
echo task=1,op=0 > /proc/lattester

4. See the output using dmesg
dmesg | tail -n 100
and get the output like this:
[ 266.020088] {0}[0]load-fence-64 avg 2729, stddev^2 21261321, max 58924, min 864 
[ 266.064615] {0}Running ntload-fence-64 
[ 268.347241] {0}[1]ntload-fence-64 avg 2728, stddev^2 2328676, max 22524, min 866 
[ 268.391851] {0}Running store-fence-64 
[ 269.926788] {0}[2]store-fence-64 avg 18446744073709550438, stddev^2 5080516, max 58952, min 18446744069414592274 
[ 269.994989] {0}Running store-clflush-64 
[ 270.868930] {0}[3]store-clflush-64 avg 18446744073709548557, stddev^2 19686969, max 53560, min 18446744069414585292

I'm afraid this result is incorrect. May I ask what is wrong with my testing process.
@sheepx86
Copy link
Contributor

sheepx86 commented Aug 8, 2020

The rdtscp instruction (read TSC) is not a serializing instruction. There are possibilities that two rdtscp (with mfence) end up in a negative number. There are a few ways to mitigate this issue, since we are trying to measure "memory latency", some of the methods may affect the result.

What we did was simply disregarding the outliers. See the parsing code at:
src/testscript/parsing/10_parse_basic.py

Reference:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf

@pocean pocean closed this as completed Aug 9, 2020
@pocean
Copy link
Author

pocean commented Aug 9, 2020

@sheepx86 Thanks for your replying. I disregard the outliers and got the results:
[ 203.766288] {0}Running load-fence-64
[ 206.145701] {0}[0]load-fence-64 avg 1620, stddev^2 37405456, max 9908, min 1000
[ 206.190273] {0}Running ntload-fence-64
[ 208.478466] {0}[1]ntload-fence-64 avg 2702, stddev^2 1695204, max 9988, min 868
My cpu is : Intel(R) Xeon(R) Gold 6246 CPU @ 3.30GHz, so I think the latency of "load-fence-64" is 1000/3.3=303ns. However, according to the paper, the latency of sequential Optane load is ~169ns. In addition, the random Optane read latency I tested have similar results, it seems that there is no buffer on the Optane.

@pocean pocean reopened this Aug 9, 2020
@TheNetAdmin
Copy link
Member

@yzouchen
Copy link

yzouchen commented Nov 2, 2022

My situation is totally the same, and I already tried Set CPU to 'performance mode and Turn off cache prefetcher.
Cant get a similar result of task 1. Any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants